Staggering fieldwork costs make established face-to-face survey projects increasingly consider switching to an online mode of data collection. However, online survey questionnaires usually need to be considerably shorter than those of face-to-face surveys due to more breakoff and lower response rates. Split Questionnaire Designs (SQD), which allow to reduce survey length without dropping items completely from the survey, could be a solution to this issue. SQD entails that the questionnaire is divided into a pre-determined number of modules and a random subset of these modules to be assigned to each participant. Thereby, a pattern of observed and missing data is generated that can be completed through imputation subsequently.
However, the reality of social survey data inflicts unique challenges on the imputation of data obtained with an SQD: Correlations are typically relatively weak while there is much data to be imputed. Hence, in order to support good data quality in face of such adverse conditions, it may be especially important to exploit the correlation structure of the data by constructing modules that separate highly correlated variables. Meanwhile, exact correlations are often not known before data collection starts, so we must rely on some heuristics. Since questions from the same topic may often be correlated, one may consider constructing modules containing questions of diverse topics. In contrast, modules with questions consisting of the same topic, although intuitive from a questionnaire design perspective and previously often implemented, might be far from optimal because highly correlated questions will tend to be in the same module.
To promote data quality in future implementations of SQD, we need to close some knowledge gaps in previous research: How well does imputation perform with SQDs under real-data conditions? What role does the module construction strategy play for the imputation?
In this talk, we will present findings from Monte Carlo simulation studies based on real survey data from the German Internet Panel. We show how different module construction strategies (randomly created modules, single topic modules and diverse topics modules) affect the estimation of univariate frequencies and bivariate correlations after imputation. We will also provide first findings on the impact of different decisions regarding the imputation of the missing data. Current results show that the imputation tendentially introduces small biases in univariate frequencies, but larger biases in correlations. Further, diverse topics modules perform similar compared to randomly created modules. Meanwhile, biases generally tend to be more pronounced with single topic modules.