Julian B. Axenfeld, Christian Bruch, Christof Wolf
General-purpose imputation of planned missing data in social surveys: different strategies and their effect on correlations

DAGStat 2022, Hamburg, 28. März bis 01. April 2022

Planned missing survey data, for example stemming from split questionnaire designs, are becoming more and more common in large-scale social surveys, making imputation indispensable to obtain reasonably analyzable data. This is especially because surveys are facing pressures to shorten questionnaires: Long questionnaires are associated with low response rates, poor response quality, and are particularly considered inappropriate for the increasingly popular online mode. However, these data can be difficult to impute due to common features of social survey data, such as low correlations, predominantly categorical data, and relatively small sample sizes available to support imputation models with many potential predictor variables. In this presentation, we discuss findings from a Monte Carlo simulation in which we simulate split questionnaire designs, evaluating different imputation methods based on data from the German Internet Panel (GIP). In this simulation, we also experiment with predictor set specifications in which imputation models are restricted exclusively to variables that have correlations to the imputed variable clearly larger than zero. Our results show that strategies that simplify the imputation exercise (for instance, predictive mean matching procedures with restricted predictor sets) perform well, while some established strategies lead to strong biases.