Credit Risk Modeling Using Interpreted XGBoost
Credit Risk Modeling Using Interpreted XGBoost
Author(s): Marcin Hernes, Jędrzej Adaszyński, Piotr TutakSubject(s): Business Economy / Management, ICT Information and Communications Technologies
Published by: Wydawnictwo Naukowe Wydziału Zarządzania Uniwersytetu Warszawskiego
Keywords: credit risk; risk modeling; XGBoost; machine learning interpretability; explainable artificial intelligence;
Summary/Abstract: Purpose: The aim of the paper is to develop a credit risk assessment model usingb the XGBoost classifier supported by interpretation issues.Design/methodology/approach: The risk modeling is based on Extreme Gradient Boosting (XGBoost) in the research. It is a method used for regression and classification problems. It is based on a sequence of decision trees using a gradient-based optimization method of the loss function to minimize the errors of weak estimators. We use also methods for performing local and global interpretability: ceteris paribus charts, SHAP and feature importance approach.Findings: Based on the research results, it can be concluded that XGBoost achieved higher values of performance metrics than logistic regression, except sensitivity. It means that XGBoost indicated a smaller percentage of all bad client. Results of local interpretability enable a conclusion that in the case of the client in question, the credit decision is positively influenced by credit scores from external suppliers, while it is negatively influenced by minimal external scoring and short seniority. The number of years in the car and higher education are also positive. Such information helps to justify a negative credit decision. Results of global interpretability enable a conclusion that higher values of the traits associated with the z-scores are accompanied by negative Shapley values, which can be interpreted as a negative effect on the explanatory variable.Research limitations/implications: XGBoost, A ceteris paribus plot, SHAP, and feature importance methods can be used to develop a credit risk assessment model including machine learning interpretability. The main limitation of research is to compare the results of XGBoost only to the logistic regression results. Future research should focus on comparing the results of XGBoost to other machine learning methods, including neural networks.Originality/value: One of the key processes in a bank is the credit decision process, which is the evaluation of a client’s repayment risk. In the consumer finance sector, the processes are usually largely automated, and increasingly the latest machine learning methods based on neural networks and ensemble learning methods are being used for the purpose. Although machine learning models allow for achieving higher accuracy of credit risk assessment compared to traditional statistical methods, the main problem is the low interpretability of machine learning models. The models often perform as the “black box”. However, the interpretation of the results of risk assessment models is very important due to the need to explain to the client the reasons for assessing their credit risk.
Journal: Problemy Zarządzania
- Issue Year: 21/2023
- Issue No: 3 (101)
- Page Range: 46-70
- Page Count: 25
- Language: English