Christoph Kern, Yan Li, Lingxiao Wang
Boosted Kernel Weighting - Using Statistical Learning to Improve Inference From Nonprobability Samples

Journal of Survey Statistics and Methodology, 2021: 9, issue 5, pp. 1088–1113
ISSN: 2325-0984 (print), 2325-0992 (online)

Given the growing popularity of nonprobability samples as a cost- and time-efficient alternative to probability sampling, a variety of adjustment approaches have been proposed to correct for self-selection bias in nonrandom samples. Popular methods such as inverse propensity-score weighting (IPSW) and propensity-score (PS) adjustment by subclassification (PSAS) utilize a probability sample as a reference to estimate pseudo-weights for the nonprobability sample based on PSs. A recent contribution, kernel weighting (KW), has been shown to be able to improve over IPSW and PSAS with respect to mean squared error. However, the effectiveness of these methods for reducing bias critically depends on the ability of the underlying propensity model to reflect the true (self-)selection process, which is a challenging task with parametric regression. In this study, we propose a set of pseudo-weights construction methods, KW-ML, utilizing both machine learning (ML) methods (to estimate PSs) and KW (to construct pseudo-weights based on the ML-estimated PSs), which provides added flexibility over logistic regression-based methods. We compare the proposed KW-ML pseudo-weights that are based on model-based recursive partitioning, conditional random forests, gradient tree boosting, and model-based boosting, with KW pseudo-weights based on parametric logistic regression in population mean estimation via simulations and a real data example. Our results indicate that particularly boosting methods represent promising alternatives to logistic regression and result in KW estimates with lower bias in a variety of settings, without increasing variance.