Dirty data – profiling, cleansing and
prevention Cover Image

Profilowanie, oczyszczanie i zapobieganie powstawaniu dirty data
Dirty data – profiling, cleansing and prevention

Author(s): Kamila Migdał-Najman, Krzysztof Najman
Subject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: Big Data; dirty data; profiling data; data cleansing; defect prevention

Summary/Abstract: There are almost unlimited sources of large streams of information now being referred to as Big Data. Because of it we hope for a faster, cheaper, more precise and versatile description in the world around us. At the same time, in such data sets, apart from data of a proper quality (clear data), significant share is false, outdated, noisy data, often multiplied, incomplete or incorrect (dirty data), as well as data of unknown quality or usefulness (dark data). A significant share of dirty data and dark data causes a number of negative consequences in the analysis of Big Data sets. The aim of this article is to review and systemically capture the procedures for minimizing the negative effects of dirty data in the analysis of Big Data. The design of the data collection system includes the most important profiling procedures (profiling data), cleansing data and defect prevention of dirty data in the process of building and analyzing the Big Data sets.

  • Issue Year: 2018
  • Issue No: 508
  • Page Range: 146-156
  • Page Count: 11
  • Language: Polish
Toggle Accessibility Mode