Ruben L. Bach, Christoph Kern, Denis Bonnay, Luc Kalaora
Understanding political news media consumption with digital trace data and natural language processing

Journal of the Royal Statistical Society: Series A, Statistics in Society, 2022: 185, issue S2, pp. S246-S269
ISSN: 0964-1998 (print), 1467-985X (online)

Abstract Augmenting survey data with digital traces is a promising direction for combining the advantages of active and passive data collection. However, extracting interpretable measurements from digital traces for social science research is challenging. In this study, we demonstrate how to obtain measurements of news media consumption from survey respondents' web browsing data using Bidirectional Encoder Representations from Transformers, a powerful natural language processing algorithm that estimates contextual word embeddings from text data. Our approach is particularly relevant for political scientists and communication researchers studying exposure to online news content but can easily be adapted to projects in other disciplines working with similar data sets.