Using Machine Learning to Model Bankruptcy Risk in Listed Companies
Using Machine Learning to Model Bankruptcy Risk in Listed Companies
Author(s): Vlad Teodorescu, Cătălina-Ioana ToaderSubject(s): Economy, Business Economy / Management, Financial Markets
Published by: EDITURA ASE
Keywords: bankruptcy risk; probability of bankruptcy; machine learning; xgboost; random forest;
Summary/Abstract: This article extensively studies the optimisation and relative performance of three classes of machine learning models (logistic regression with regularisation, Random Forest, and XGBoost) to quantify the probability of bankruptcy using financial data from a database of listed companies in Taiwan. The database covers the period from 1999 to 2009, contains 95 financial ratios from 7 categories, has 6,819 observations, and has a bankruptcy rate of approximately 3.2%. The database choice stemmed from our wish of utilising a dataset which was publicly available and that posed high quality and moderate size, traits that permitted the rapid training of machine learning models. As a result, we were able to run experiments based on multiple model configurations and to compare the attained results with the ones gathered by other researchers. For the purpose of splitting data for training and testing sets, the k-fold cross-validation methodology can be used. We investigate the validity of its use, especially in the context of XGBoost with an early stopping round based on the test fold. We also determine the sensitivity of predictive performance on the value of k and on the specific folds created. We use AUROC as a performance measure and show that Random Forest models significantly outperform logistic models with regularisation, while XGBoost models have a moderately higher performance than Random Forest. For each type of model, we study hyperparameter tuning and demonstrate that this process has a significant effect on predictive performance. For the first two types of model, we perform a full grid search. For XGBoost models, we use a guided (sequential) grid search methodology. Furthermore, we study and propose a criterion for hyperparameter tuning using average performance instead of maximum performance, highlighting the relatively large effect on predictive performance of the stochastic component employed by these machine learning algorithms during training. Our research also indicates that in the case of some hyperparameters, tuning can shape predictive performance. Last but not least, the meaningfulness of variables in forecasting the bankruptcy likelihood is assessed, as it was indicated by the three classes of models.
Journal: Proceedings of the ... international conference on economics and social sciences.
- Issue Year: 6/2024
- Issue No: 1
- Page Range: 610-619
- Page Count: 10
- Language: English