Amanda Fernández-Fontelo, Felix Henninger, Pascal J. Kieslich, Frauke Kreuter, Sonja Greven
Detecting difficulty in computer-assisted surveys through mouse movement trajectories: A new model for functional data classification

BigSurv20, (virtual conference), November 06th to December 04th, 2020

One of the main goals of survey research is to collect robust and reliable data from respondents and, conversely, to reduce sources of measurement error. One source of error stems from respondents’ difficulty in understanding and responding to survey questions in the way the researchers intended. Thus, detecting and mitigating these difficulties promises to improve both the user experience and data quality. In the presence of a human interviewer, difficulty can be assessed by identifying and quantifying paralinguistic cues, and by directly addressing these issues with the respondent. These cues are not available if surveys are conducted online via the browser, which has become one of the predominant modes of data collection. However, by collecting additional paradata while respondents answer a questionnaire, web surveys provide researchers and practitioners with a novel data source that may indicate potential difficulties and confusion the respondent experienced. The current contribution focuses on a particular type of paradata, respondents’ mouse cursor movements, and how these rich data may be processed and analyzed to detect instances when respondents experienced difficulty. To determine the predictive value of mouse-tracking data for the prediction of participant difficulty, we conducted an online survey assessing participants’ personal and economic background. Throughout the survey, we experimentally manipulated the difficulty of several questions, for example, by using either concise and understandable vs. complex and verbose language, or by ordering response options in an intuitive vs. random order. Using a custom client-side paradata collection framework, we recorded participants’ mouse movements during the survey. From the collected data, we extracted a large set of mouse movement features using the mousetrap R package we developed. Using features derived from the cursor movements, we predicted whether respondents answered the easy (i.e., the understandable or intuitively ordered) or difficult (i.e., the complex or randomly ordered) version of a question. To do so, we propose a custom machine-learning model that takes into account the time series of participants’ interactions with the survey page. To build this model, we first adapted a range of common distance metrics to the case of multivariate trajectory data. Then, we used these distances to create base classifiers based on the KNN and kernel-based approaches introduced by Fuchs et al. (2015) and Ferraty and Vieu (2003). Finally, we combined the base classifiers into an ensemble using different techniques (linear combination and stacking methods) and evaluate their predictive accuracy. Going beyond these methods, we propose a personalization method to control for the baseline mouse behavior of the survey participants. We discuss how the methods of the current project can be applied to other online surveys, and provide an R package that implements the presented classification method. This can be applied to mouse-tracking as well as more generally for multivariate functional data and trajectories in any number of dimensions.