Evaluation of Selected Approaches to Clustering Categorical Variables Cover Image

Evaluation of Selected Approaches to Clustering Categorical Variables
Evaluation of Selected Approaches to Clustering Categorical Variables

Author(s): Zdeněk Šulc, Hana Řezanková
Subject(s): Economy
Published by: Główny Urząd Statystyczny
Keywords: variable clustering;nominal variables;association measures;similarity measures

Summary/Abstract: This paper focuses on recently proposed similarity measures and their performance in categorical variable clustering. It compares clustering results using three recently developed similarity measures (IOF, OF and Lin measures) with results obtained using two association measures for nominal variables (Cramér’s V and the uncertainty coefficient) and with the simple matching coefficient (the overlap measure). To eliminate the influence of a particular linkage method on the structure of final clusters, three linkage methods are examined (complete, single, average). The created groups (clusters) of variables can be considered as the basis for dimensionality reduction, e.g. by choosing one of the variables from a given group as a representative for the whole group. The quality of resulting clusters is evaluated by the within-cluster variability, expressed by the WCM coefficient, and by dendrogram analysis. The examined similarity measures are compared and evaluated using two real data sets from a social survey.

  • Issue Year: 15/2014
  • Issue No: 4
  • Page Range: 591-610
  • Page Count: 20
  • Language: English