A Performance-Driven Exploration of Combining Topic Modeling and Machine Learning for Online Learning Data Analysis
A Performance-Driven Exploration of Combining Topic Modeling and Machine Learning for Online Learning Data Analysis
Author(s): Pachisa Kulkanjanapiban, Tipawan SilwattananusarnSubject(s): Education and training, ICT Information and Communications Technologies
Published by: UIKTEN - Association for Information Communication Technology Education and Science
Keywords: Text mining; topic modeling; clustering ensembles; knowledge discovery; latent Dirichlet allocation
Summary/Abstract: The research aims to determine how well three clustering algorithms K-Means, hierarchical, and ensemble clustering work when combined with sophisticated topic modeling methods such as latent Dirichlet allocation (LDA), or nonnegative matrix factorization (NMF), and latent semantic analysis (LSA) on online learning datasets. The study’s data was scrapped from online learning resource platforms, i.e., Coursera, Udacity, edX, and FutureLearn in 2023and from Kaggle in 2021. Findings demonstrate that the LDA is more appropriate for the clustered data points generating topics. LDA-based clustering performs more efficiently than NMF-and LSA-based clustering when ensemble clustering is used. The proposed combination approach is beneficial when dealing with complicated education data with an ambiguous structure that requires interpretable insights. The study can reduce overfitting, identify more robust thematic clusters, and offer a powerful approach for unlabeled text analysis
Journal: TEM Journal
- Issue Year: 14/2025
- Issue No: 1
- Page Range: 511-527
- Page Count: 17
- Language: English
