Analyzing customer churn in the banking industry using machine learning algorithms

Document Type : Original Article

Authors

1 PhD Candidate in Operations Research, Department of Management, Faculty of Economics & Administrative Sciences, Ferdowsi University of Mashhad, Mashhad, Iran

2 Associate Professor Department of Management Faculty of Economics and Business Administration Ferdowsi University Of Mashhad (FUM) Associate Professor Department of Management Faculty of Economics and Business Administration Ferdowsi

Abstract
In the highly competitive banking industry, customer retention has become a major challenge for financial institutions that can affect their profitability, market position, and reputation. This study aimed to identify factors affecting customer churn and predict it using five machine learning algorithms, including decision trees, random forests, XGBoost, AdaBoost, and KNN. Demographic and banking data of customers of a private bank in Mashhad were collected and analyzed after thorough cleaning (removal of duplicate data, outliers, and missing data management). The models were evaluated using accuracy, precision, recall, and F1-score metrics. The results showed that the XGBoost algorithm performed best in predicting customer churn with an accuracy of 92 percent. These findings indicate that advanced machine learning algorithms, especially XGBoost, can help banks identify customers at risk of churn early and implement targeted strategies to retain them. This approach enables the provision of personalized services, optimization of marketing and loyalty programs, and reduction of operational costs. As a result, the use of these methods helps to improve financial performance and strengthen the competitive advantage of banks.

Keywords


[1] Zorić, B., Predicting customer churn in banking industry using neural networks. Interdisciplinary Description of Complex Systems: INDECS, 2016. 14(2): p. 116-124.https://doi.org/10.7906/indecs.14.2.1.
[2] Najafi, A. and Akhondzadeh Noughabi, E., Pattern Mining of customer dynamics through different customer value states by using sequence pattern mining and big data analytics. Modern Research in Decision Making, 2024. 9(4): p. 68-93.(in persian)
[3] Liu, X., et al., Customer churn prediction model based on hybrid neural networks. Scientific Reports, 2024. 14(1): p. 30707.https://doi.org/10.1038/s41598-024-79603-9
[4] Keramati, A., Ghaneei, H., and Mirmohammadi, S.M., Developing a prediction model for customer churn from electronic banking services using data mining. Financial Innovation, 2016. 2: p. 1-13.https://doi.org/10.1186/s40854-016-0029-6.
[5] Guliyev, H. and Yerdelen Tatoğlu, F., Customer churn analysis in banking sector: Evidence from explainable machine learning model. Journal of Applied Microeconometrics, 2021. 1(2).
[6] Tran, H., Le, N., and Nguyen, V.-H., CUSTOMER CHURN PREDICTION IN THE BANKING SECTOR USING MACHINE LEARNING-BASED CLASSIFICATION MODELS. Interdisciplinary Journal of Information, Knowledge & Management, 2023. 18.https://doi.org/10.28945/5086.
[7] Alizadeh, M., et al., Development of a customer churn model for banking industry based on hard and soft data fusion. IEEE Access, 2023. 11: p. 29759-29768.https://doi.org/10.1109/ACCESS.2023.3257352.
[8] Chang, V., et al., Prediction of bank credit worthiness through credit risk analysis: an explainable machine learning study. Annals of Operations Research, 2024: p. 1-25.https://doi.org/10.1007/s10479-024-06134-x
[9] Singh, P.P., et al., Investigating customer churn in banking: A machine learning approach and visualization app for data science and management. Data Science and Management, 2024. 7(1): p. 7-16.https://doi.org/10.1016/j.dsm.2023.09.002
[10] Jafari, M.J., Tarokh, M.J., and Soleimani, P., A data-driven Agent-based model and framework for Churn prediction in Telecommunication Industry. Modern Research in Decision Making, 2024. 9(2): p. 164-190.(in persian)
[11] Yousefi Ghaleh Roudkhani, M.A., Tehrani, R., and Mirlouhi, S.M., Investigating the Impact of Financial performance metrics on Financial Stability of Banks in the Financial Crisis. Management Research in Iran, 2021. 25(2): p. 1-21.(in persian)
[12] de Lima Lemos, R.A., Silva, T.C., and Tabak, B.M., Propension to customer churn in a financial institution: A machine learning approach. Neural Computing and Applications, 2022. 34(14): p. 11751-11768.https://doi.org/10.1007/s00521-022-07067-x.
[13] Matuszelański, K. and Kopczewska, K., Customer churn in retail e-commerce business: Spatial and machine learning approach. Journal of Theoretical and Applied Electronic Commerce Research, 2022. 17(1): p. 165-198.https://doi.org/10.3390/jtaer17010009.
[14] zarei, G., Mohammad khani, R., and fathi, h., Investigating and identifying the consequences of using artificial intelligence in marketing. Management Research in Iran, 2024. 28(2): p. 1-31.(in persian)
[15] Keramati, A., Ghaneei, H., and Mirmohammadi, S.M., Investigating factors affecting customer churn in electronic banking and developing solutions for retention. International Journal of Electronic Banking, 2020. 2(3): p. 185-204.https://doi.org/10.1504/IJEBANK.2020.111427.
[16] Haddadi, S.J., et al., Customer churn prediction in imbalanced datasets with resampling methods: A comparative study. Expert Systems with Applications, 2024. 246: p. 123086.https://doi.org/10.1016/j.eswa.2023.123086.
[17] Lalwani, P., et al., Customer churn prediction system: a machine learning approach. Computing, 2022. 104(2): p. 271-294.https://doi.org/10.1007/S00607-021-00908-Y.
[18] Mahesh, B.S., et al., Predicting Customer Churn in Subscription-Based Enterprises Using Machine Learning, in Book·Predicting Customer Churn in Subscription-Based Enterprises Using Machine Learning.2017,Springer Place·Published.p.365-377.https://doi.org/10.1007/978-981-99-8438-1_26
[19] El Khair Ghoujdam, M., et al., Consumer credit risk analysis through artificial intelligence: a comparative study between the classical approach of logistic regression and advanced machine learning techniques. Cogent Economics & Finance, 2024. 12(1): p. 2414926.https://doi.org/10.1080/23322039.2024.2414926.
[20] Shobana, J., et al., E-commerce customer churn prevention using machine learning-based business intelligence strategy. Measurement: Sensors, 2023. 27: p. 100728.https://doi.org/10.1016/j.measen.2023.100728.
[21] Gurung, N., et al., AI-Based Customer Churn Prediction Model for Business Markets in the USA: Exploring the Use of AI and Machine Learning Technologies in Preventing Customer Churn. Journal of Computer Science and Technology Studies, 2024. 6(2): p. 19-29.https://doi.org/10.32996/jcsts.
[22] Dursun-Cengizci, A. and Caber, M., Using machine learning methods to predict future churners: an analysis of repeat hotel customers. International Journal of Contemporary Hospitality Management, 2024.https://doi.org/10.1108/IJCHM-06-2023-0844.
[23] Çallı, L. and Kasım, S., Using Machine Learning Algorithms to Analyze Customer Churn in the Software as a Service (SaaS) Industry. Academic Platform Journal of Engineering and Smart Systems, 2022. 10(3): p. 115-123.https://doi.org/10.21541/apjess.1139862.
[24] Suh, Y., Machine learning based customer churn prediction in home appliance rental business. Journal of big Data, 2023. 10(1): p. 41.https://doi.org/10.1186/s40537-023-00721-8.
[25] He, Y., Xiong, Y., and Tsai, Y. Machine learning based approaches to predict customer churn for an insurance company. in 2020 Systems and Information Engineering Design Symposium (SIEDS). 2020. IEEE.https://doi.org/10.1109/SIEDS49339.2020.9106691
[26] Allen, K., et al., Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation (pre-1986), 1955. 6(2): p. 93.https://doi.org/10.1002/asi.5090060209.
[27] Goutte, C. and Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. in European conference on information retrieval. 2005. Springer.https://doi.org/10.1007/978-3-540-31865-1_25
[28] Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.https://doi.org/10.1145/2939672.2939785
[29] Quinlan, J.R., Induction of decision trees. Machine learning, 1986. 1: p. 81-106.https://doi.org/10.1007/BF00116251.
[30] Breiman, L., Random forests Mach Learn 45 (1): 5–32. 2001, ed.https://doi.org/10.1023/A:1010933404324
[31] Cover, T. and Hart, P., Nearest neighbor pattern classification. IEEE transactions on information theory, 1967. 13(1): p. 21-27.https://doi.org/10.1109/TIT.1967.1053964.
[32] Freund, Y. and Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 1997. 55(1): p. 119-139.https://doi.org/10.1006/jmsp.1997.1504.