Information loss resulting from statistical disclosure control of output data Cover Image

Strata informacji wskutek przeprowadzenia kontroli ujawniania danych wynikowych
Information loss resulting from statistical disclosure control of output data

Author(s): Andrzej Młodak
Subject(s): Economy, Socio-Economic Research
Published by: Główny Urząd Statystyczny
Keywords: statistical disclosure control; SDC; information loss; cyclometric function; inverse correlation matrix

Summary/Abstract: The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well-interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.

  • Issue Year: 65/2020
  • Issue No: 09
  • Page Range: 7-27
  • Page Count: 21
  • Language: Polish
Toggle Accessibility Mode