EFFECT OF ENCODING CATEGORICAL DATA ON STUDENT'S ACADEMIC PERFORMANCE USING DATA MINING METHODS Cover Image

EFFECT OF ENCODING CATEGORICAL DATA ON STUDENT'S ACADEMIC PERFORMANCE USING DATA MINING METHODS
EFFECT OF ENCODING CATEGORICAL DATA ON STUDENT'S ACADEMIC PERFORMANCE USING DATA MINING METHODS

Author(s): Moohanad JAWTHARI, Veronika Stoffová
Subject(s): Higher Education , ICT Information and Communications Technologies, Distance learning / e-learning, Pedagogy
Published by: Carol I National Defence University Publishing House
Keywords: E-learning; Educational Data Mining; Student performance; Support Vector machine; Random Forest;

Summary/Abstract: Educational data mining (EDM) is the techniques used to discover the knowledge from student’s data it is used to improve the students’ performance and teachers’ performances as well. Since Machine learning (ML) models deals with numeric data, preprocessing of the data is a must step to transform such data into accepted types by ML models. Data may come in Categorical forms, and is further divided into nominal and ordinal attributes in the dataset. In this paper, we study the effect of encoding some non-ordinal features as one-hot (dummy variables) on the students' performance prediction accuracy. We used techniques form ensemble methods such as Random Forest Trees, Boosting methods specifically namely gradient Boosted trees (GBT), and support vector machines. Also, we compared the performance of Random forest and Gradient boosted trees. We achieve a better result of 81% using random forest classifier. GBT has approximately same performance in all cases. SVM accuracy improved when used with dummy variables. Therefore, this study shows there is an effect on the models.

  • Issue Year: 16/2020
  • Issue No: 01
  • Page Range: 521-526
  • Page Count: 6
  • Language: English
Toggle Accessibility Mode