Synthesizing an anonymized multidimensional dataset featuring financial, economic, demographic, and personal traits data Cover Image

Synthesizing an anonymized multidimensional dataset featuring financial, economic, demographic, and personal traits data
Synthesizing an anonymized multidimensional dataset featuring financial, economic, demographic, and personal traits data

Author(s): Vasil Marchev, Angel Marchev, Jr, Milena Piryankova, Daniel Masarliev, Valentin Mitkov
Subject(s): Economy, Business Economy / Management, Socio-Economic Research
Published by: Евдемония Продъкшън ЕООД
Keywords: Synthetic Data Generation; Cholesky Decomposition; Kolmogorov-Smirnov Test

Summary/Abstract: This paper presents a novel approach to generating synthetic data arrays that address the scarcity of datasets containing sensitive information due to restrictions imposed by legislation such as the GDPR and the Bank Secrecy Act. By integrating statistical methods, including Monte-Carlo simulation and Cholesky decomposition, with business logic, the study outlines a comprehensive methodology for the creation of multidimensional synthetic data sets. These datasets incorporate demographic, personality, financial, and banking variables to simulate the profiles of financially active individuals. This alternative to traditional data collection methods offers a solution to the challenges of accessing sensitive data while maintaining compliance with legal frameworks. The use of synthetic data allows for the preservation of variable interrelationships and provides a secure testing environment, despite the inherent complexities in generating high-quality synthetic databases. Validation of the synthesized data through the Kolmogorov-Smirnov test ensures their accuracy and relevance. This approach not only facilitates the advancement of data-driven models in fields where access to sensitive data is limited but also promotes the ethical use of data by adhering to privacy regulations. The paper demonstrates the potential of synthetic data to serve as a viable resource for scientific research, offering a detailed exploration of its generation process and the implications for future applications in sensitive areas of study.

  • Issue Year: 19/2023
  • Issue No: 1
  • Page Range: 79-99
  • Page Count: 21
  • Language: English
Toggle Accessibility Mode