Seminarium med Dr. Kiyomi Shirakawa, Institute of Economic Research, Hitotsubashi University och Mr. Yutaka Abe, Institute of Economic Research och Research Centre for Information and Statistics of Social Science, Hitotsubashi University.
Abstract (endast på engelska)
When creating synthetic microdata in Japan, the values from result tables are used in order to remove links to individual data. The result tables of conventional official statistics do not allow the generation of random numbers for reproducing the individual data. Therefore, the National Statistics Center has created pseudo-individual data on a trial basis using the 2004 National Survey of Family Income and Expenditure.
Although mean, variance, and correlation coefficient in the original data were reproduced in the synthetic microdata created, the trial did not include the creation of completely synthetic microdata from the result tables, and the reproduction of the distribution was not taken into account.
In this study, a method for generating random numbers with a distribution close to that of the original data was tested. It is called ‘Academic Use File’. The random numbers were generated completely from the values contained in the result tables. In addition, this test took into account the Anscombe's quartet, and the sensitivity rule. As a result, based on the numerical values of the result tables, it was possible to introduce the closest approach to the distribution type of the original data.