Selekcja zmiennych w klasyfikacji – propozycja algorytmu
Variable selection in classification – algorithm proposal
Author(s): Jerzy KorzeniewskiSubject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: classification; training set; variable correlation
Summary/Abstract: Selection of variables in classification is important both in the case of single and aggregated methods. The simplest way of selecting variables is to check their correlation with the proper classification of objects on the training set. This natural way, however, has serious limitations stemming from the fact that for weak measurement scales finding corre-lation is troublesome. The paper proposes a method of measuring the strength of correlation by means of the linear correlation coefficient based on the distances between pairs of obser-vations for arbitrary single attribute and the class labels attribute. The attributes with correla-tion below a certain threshold are rejected. The efficiency of the method is investigated on UCI data sets. The results are compared with stepclass and Boruta procedures available in R language.
Journal: Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu
- Issue Year: 2014
- Issue No: 328
- Page Range: 69-75
- Page Count: 7
- Language: Polish