Paulina Pankowska, Ruben Bach, Florian Keusch, Alexandru Cernat
Measuring Facebook use: The accuracy of self-reported data vs. digital trace data

Web Data Opp Workshop, Barcelona, March 18th to March 19th, 2024

The recent growing availability of digital trace data has prompted social scientists to rely more on these data sources in their research. Information obtained from these data is increasingly used in current studies to replace surveys, or as a benchmark to validate and assess the quality of survey- based measures. These studies often rely on the unrealistic assumption that log-data are errorfree. However, like any other data source, digital traces are likely to be subject to non-negligible error. Research related to social media and its impacts is a domain in which these concerns are particularly prominent. Therefore, in this paper, we examine the quality of self-reported and digital-trace measures of Facebook use simultaneously, while allowing each of the sources to contain measurement error. To assess and compare the error in these two sources, we apply hidden Markov models to a sample from a nonprobability online panel in Germany. The self-reported measures are based on a longitudinal survey, and the digital-trace measures are based on information from tracking apps that were installed on the respondents’ smartphones and/or computers. Our results suggest that both sources measure Facebook use rather consistently for about two thirds of the sample. For the remaining one third of the respondents, we observe large inconsistencies between the survey and the digital-trace measures. Namely, in this group, respondents are highly likely to report using Facebook daily, while the tracking data shows no, or close to no, Facebook use. Our results suggest that the inconsistencies can be (partially) due to the tracking data being systematically incomplete for some individuals, as potentially the device or devices on which Facebook is used most often is not tracked. A further confirmation for our findings comes from estimates obtained from Facebook donated data, which is available for a sub-sample of our respondents.