Using hidden Markov models to assess and correct for measurement error in digital trace data

Communication Methods and Measures
,
vi, 1-25 S.
,
2025

Pankowska, Paulina K., Alexandru Cernat, Florian Keusch, Ruben Bach
ISSN: 1931-2458 (print) , 1931-2466 (online)

Digital trace data are increasingly used across the social and behavioral sciences. They allow researchers to access large volumes of highly detailed and continuous information. Such scale and speed cannot be achieved when using traditional sources, such as surveys. Digital traces are also believed to overcome some of the limitations that surveys are criticized for. However, while their use undoubtedly presents researchers with new possibilities, it also introduces new quality challenges that have been increasingly acknowledged. Accounting for these limitations is crucial, as they can lead to biased results and incorrect research findings. Therefore, in this paper, we apply hidden Markov models (HMMs) to digital trace data on Facebook use to assess the nature and incidence of error in measures of Facebook use frequency. HMMs are an attractive method that allows for the estimation and correction of error without the availability of (error-free) gold-standard data, if the assumptions regarding the underlying construct of interest and the nature of the error are met. Our results suggest that the measures derived from digital trace data severely underestimate the frequency of Facebook use for a third of our sample, in particular when not all relevant devices are tracked.