Implementation of K-Nearest Neighbor using the oversampling technique on mixed data for the classification of household welfare status
Implementation of K-Nearest Neighbor using the oversampling technique on mixed data for the classification of household welfare status
Author(s): Nur Mutmainnah Djafar, Achmad FauzanSubject(s): Economy
Published by: Główny Urząd Statystyczny
Keywords: ADASYN; KNN; random oversampling; SMOTE; welfare;
Summary/Abstract: Welfare is closely related to poverty and the socio-economic disparities in a society. Based on data from the Central Bureau of Statistics, Kulon Progo in Indonesia had the highest poverty rate in the province of the Special Region of Yogyakarta; an increasing trend was observed every year from 2019 to 2021; Kulon Progo also had a low poverty line (after Gunung Kidul) compared to other regencies/cities in this province. This study aimed to classify the household welfare status in Kulon Progo in March 2021 using the K-Nearest Neighbor (KNN) method. Since imbalance was found between the poor and non-poor classes, an oversampling technique was employed. Imbalanced data affect classification, particularly when predicting the results of the classification. The following oversampling techniques were employed in this study: Random Oversampling (RO), the Adaptive Synthetic (ADASYN) and the Synthetic Minority Oversampling Technique (SMOTE). It was found that, of the three techniques, RO was the most efficient with k = 5, which yielded the best performance in terms of sensitivity, specificity, the G-mean, and accuracy reaching 0.643, 0.805, 0.719, and 78.873%, respectively. Therefore, it can be concluded that the classification model performed well enough to classify household welfare status, especially among the poor (minority class).
Journal: Statistics in Transition. New Series
- Issue Year: 25/2024
- Issue No: 1
- Page Range: 109-124
- Page Count: 15
- Language: English