In the field of gas turbine (GT) monitoring and diagnostics, GT trip is of great concern for manufactures and users. In fact, due to the number of issues that may cause a trip, its occurrence is not infrequent, and its prediction is a quite unexplored field of research. This is demonstrated by the fact that, despite its relevance, a comprehensive study on the reliability of predicting GT trip has not been proposed yet. To fill this gap, this paper investigates the fusion of five data-driven base models by means of voting and stacking, in order to improve prediction accuracy and robustness. The five benchmark supervised machine learning and deep learning classifiers are k-nearest neighbors, support vector machine (SVM), Naïve Bayes (NB), decision trees (DTs), and long short-term memory (LSTM) neural networks. While voting just averages the predictions of base models, without providing additional pieces of information, stacking is a technique used to aggregate heterogeneous models by training an additional machine learning model (namely, stacked ensemble model) on the predictions of the base models. The analyses carried out in this paper employ filed observations of both safe operation and trip events, derived from a large fleet of industrial Siemens GTs in operation. The results demonstrate that the stacked model provides higher accuracy than base models and also outperforms voting by proving more effective, especially when the reliability of the prediction of base models is poor.