Comparison of methods used for filling partially unobserved contingency tables
Comparison of methods used for filling partially unobserved contingency tables
Author(s): Michał Kot, Bogumił KamińskiSubject(s): Economy
Published by: Główny Urząd Statystyczny
Keywords: contingency tables; Markov Chain Monte Carlo; Iterative Proportional Fitting Procedure
Summary/Abstract: In this article, we investigate contingency tables where the entries containing small counts are unknown for data privacy reasons. We propose and test two competitive methods for estimating the unknown entries: our modification of the Iterative Proportional Fitting Procedure (IPFP), and one of the Monte Carlo Markov Chain methods called Shake-and-Bake. We use simulation experiments to test these methods in terms of time complexity and the accuracy of searching the space of feasible solutions. To simplify the estimation procedure, we propose to pre-process partially unknown contingency tables with simple heuristics and dimensionality-reduction techniques to find and fill all trivial entries. Our results demonstrate that if the number of missing cells is not very large, the pre-processing is often enough to find fillings for the unknown values in contingency tables. In the cases where simple heuristics are insufficient, the Shake-and-Bake technique outperforms the modified IPFP in terms of time complexity and the accuracy of searching the space of feasible solutions.
Journal: Przegląd Statystyczny. Statistical Review
- Issue Year: 68/2021
- Issue No: 4
- Page Range: 1-20
- Page Count: 20
- Language: English