Evaluation of Selected Approaches to Clustering Categorical Variables
Evaluation of Selected Approaches to Clustering Categorical Variables
Author(s): Zdeněk Šulc, Hana ŘezankováSubject(s): Economy
Published by: Główny Urząd Statystyczny
Keywords: variable clustering;nominal variables;association measures;similarity measures
Summary/Abstract: This paper focuses on recently proposed similarity measures and their performance in categorical variable clustering. It compares clustering results using three recently developed similarity measures (IOF, OF and Lin measures) with results obtained using two association measures for nominal variables (Cramér’s V and the uncertainty coefficient) and with the simple matching coefficient (the overlap measure). To eliminate the influence of a particular linkage method on the structure of final clusters, three linkage methods are examined (complete, single, average). The created groups (clusters) of variables can be considered as the basis for dimensionality reduction, e.g. by choosing one of the variables from a given group as a representative for the whole group. The quality of resulting clusters is evaluated by the within-cluster variability, expressed by the WCM coefficient, and by dendrogram analysis. The examined similarity measures are compared and evaluated using two real data sets from a social survey.
Journal: Statistics in Transition. New Series
- Issue Year: 15/2014
- Issue No: 4
- Page Range: 591-610
- Page Count: 20
- Language: English