Understanding Secondary Data Reporting: A Glimpse at TweetsCOV19 and Dreaddit

St. Louis
,
2025

Young, Laura, Fiona Draxler, Florian Keusch

Public opinion research places importance on the transparency of study design and data collection, but critical stages of data reuse by other researchers receive less attention. Insufficient reporting on data processing and analysis can obscure potential sources of error, hindering transparency, validity, and replicability. However, most transparency guidelines focus primarily on dataset creators. Given the increasing reliance particularly on secondary sources for research using digital data, which have become increasingly difficult to access in their original form due to changes in e.g. APIs— this issue warrants close examination. This study investigates reporting practices and common omissions in secondary digital behavioral data by reviewing social science research that utilizes two publicly available datasets, selected to serve as examples: TweetsCOV19 (n = 13) and Dreaddit (n = 52). Research was sourced from multiple databases, then manually coded to capture descriptive information on study motivations, time frames, and key elements relevant to replication and validity, such as sample sizes, access dates and code/data sharing. Findings reveal that, among researchers who used the datasets, essential details like access dates were often missing, a crucial omission for dynamic datasets like TweetsCOV19, where data composition can shift over time, impacting sample sizes and generalizability. Additionally, data processed by the researchers are seldom shared, and only a small minority of papers include a link to the original dataset in their publications. These findings highlight a potential trend of underreporting in social science research that relies on digital behavioral secondary data, raising concerns about transparency in data handling and reporting. To address this, we recommend further investigation—such as a scoping systematic review— to assess and improve reporting practices. Advocating for standardized guidelines will promote transparency and reproducibility in public opinion research, building upon the valuable work already in place in e.g. the AAPOR Transparency Initiative.