Joseph W. Sakshaug, Arkadiusz Wiśniowski, Diego Andres Perez Ruiz, Annelies G. Blom
Combining Scientific and Non-Scientific Surveys to Improve Estimation and Reduce Costs

Pp. 71-93 in: Tamás Rudas, Gábor Péli (Eds.): Pathways Between Social Science and Computational Social Science: Theories, Methods and Interpretations. 2021. Wiesbaden: Springer

Survey data collection costs have risen to a point where many survey researchers are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be unwise for many reasons, among them probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. Nevertheless, the attractive cost properties and convenience of nonprobability samples suggest they are here to stay. But instead of forgoing probability sampling entirely, we consider a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses. Using Bayesian inference, we evaluate the use of nonprobability data as a supplement to probability-based estimations based on small probability samples. In a case study involving actual survey data, we show that specifying prior distributions using nonprobability data reduces variances and mean-squared errors considerably for estimates of two commonly used health variables, height and weight, compared to the probability-only sample estimates. We further show that these gains in efficiency yield expected cost savings up to 66% based on actual cost data from eight nonprobability surveys conducted by different commercial vendors and assumed cost data for a probability-based Internet panel. We conclude with a discussion of these findings, their implications for survey practice, and possible research extensions.