Kiyomi Shirakawa and Yutaka Abe: Creating Synthetic Microdata for Higher Educational Use in Japan : Reproduction of Distribution Type based on the Descriptive Statistics

Time: Wed 2016-03-30 13.00 - 14.00

Location: Room B705, Department of Statistics, Stockholm university

Participating: Kiyomi Shirakawa and Yutaka Abe (Hitotsubashi University, Tokyo)

When creating synthetic microdata in Japan, the values from result tables are used in order to remove links to individual data. The resulting tables of conventional official statistics do not allow the generation of random numbers for reproducing the individual data. Therefore, the National Statistics Center has created pseudo-individual data on a trial basis using the 2004 National Survey of Family Income and Expenditure. Although the mean, variance, and correlation coefficient in the original data were reproduced in the synthetic microdata created, the trial did not include the creation of completely synthetic microdata from the result tables, and the reproduction of the distribution was not taken into account.

In this study, a method for generating random numbers with a distribution close to that of the original data was tested. It is called ‘Academic Use File’. The random numbers were generated completely from the values contained in the result tables. In addition, this test took into account the Anscombe's quartet, and the sensitivity rule. As a result, based on the numerical values of the resulting tables, it was possible to introduce the closest approach to the distribution type of the original data.

To the calendar