Profilowanie, oczyszczanie i zapobieganie
powstawaniu dirty data
Dirty data – profiling, cleansing and
prevention
Author(s): Kamila Migdał-Najman, Krzysztof NajmanSubject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: Big Data; dirty data; profiling data; data cleansing; defect prevention
Summary/Abstract: There are almost unlimited sources of large streams of information now being referred to as Big Data. Because of it we hope for a faster, cheaper, more precise and versatile description in the world around us. At the same time, in such data sets, apart from data of a proper quality (clear data), significant share is false, outdated, noisy data, often multiplied, incomplete or incorrect (dirty data), as well as data of unknown quality or usefulness (dark data). A significant share of dirty data and dark data causes a number of negative consequences in the analysis of Big Data sets. The aim of this article is to review and systemically capture the procedures for minimizing the negative effects of dirty data in the analysis of Big Data. The design of the data collection system includes the most important profiling procedures (profiling data), cleansing data and defect prevention of dirty data in the process of building and analyzing the Big Data sets.
Journal: Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu
- Issue Year: 2018
- Issue No: 508
- Page Range: 146-156
- Page Count: 11
- Language: Polish