REMOVING DK/NA VALUES IN SHARE, WVS, OR SIMILAR DATASETS. EFFECTS ON THE EXPLORATION OF PREDICTIVE MODELS Cover Image

REMOVING DK/NA VALUES IN SHARE, WVS, OR SIMILAR DATASETS. EFFECTS ON THE EXPLORATION OF PREDICTIVE MODELS
REMOVING DK/NA VALUES IN SHARE, WVS, OR SIMILAR DATASETS. EFFECTS ON THE EXPLORATION OF PREDICTIVE MODELS

Author(s): Daniel Homocianu
Subject(s): ICT Information and Communications Technologies
Published by: Editura Universităţii »Alexandru Ioan Cuza« din Iaşi
Keywords: SHARE WVS or similar datasets; DK/NA coded as negative values; effects on regression and classification models; feature selection steps; performance metrics
Summary/Abstract: This paper describes the effects of using a tool capable of automatically removing DK/NA (Do Not Know/No Answer) values from some tabular datasets. For these values, the original encoding performed by some providers of significant survey datasets (e.g., SHARE, WVS, etc.) is as negative numbers. To leave them as are means to accept an artificial increase of scales. Or that translates into dramatic changes in feature selection, exploration tasks, and performance measurements of the resulting models. The tool discussed in this paper helps avoid manually recoding or deriving cleaned replicas of the existing variables in such datasets. In a transparent manner (progress tracking), this tool automatically detects all variables specified, treats each of them, and generates immediate results corresponding to the treatment status (including exceptions for string ones). The paper also brings examples of using real-world data (World Values Survey- WVS, Time-series, v4.0).

Toggle Accessibility Mode