Modular Questionnaire Designs for Social Surveys: Statistical Modelling of Designed Missingness

Research question/goal: 

The project examined the usefulness of procedures for imputing planned missing values resulting from modular questionnaire designs (MQDs). The aim of the imputations was to obtain a complete data set ready for secondary data analysis. Our research focussed on the application of MQDs in social science surveys, which underlie typical conditions such as relatively small sample sizes, large numbers of variables, low variable/item correlations, and categorical levels of measurement.

To achieve our research aims, we ran Monte Carlo simulations on datasets of the German Internet Panel (GIP) using high-performance computers of the federal state of Baden-Wuerttemberg (bwHPC).

Our research project produced several key results: First, the allocation of items of the same topic to the same module (assuming a high correlation of these items) led to worse imputation-based estimates than a random allocation or an allocation of items of the same topic to different modules. Due to the large number of small correlations in our data, we observed only few differences between the last two strategies. Second, we compared the performance of a series of different imputation methods regarding their ability to produce complete datasets that allow for estimates with acceptable quality when applying MQDs. For small samples and large numbers of variables, which are typical for social science surveys, we obtained good results with imputation methods that simplify the imputation models. Examples are procedures that reduce the number of predictors.

Third, we examined item nonresponse, which can occur in addition to the planned missing values in MQDs. We showed that serious problems arise when the proportion of nonresponse from the sum of both sources is too large and when item nonresponse is missing not at random (MNAR). Thus, we recommend reducing the number of planned missing values for items that are expected to produce high levels of item nonresponse. 

Fact sheet

2017 to 2023
Data Sources: 
German Internet Panel, European Value Study
Geographic Space: