William E. M. Lowe, Kenneth Benoit
Estimating uncertainty in quantitative text analysis

MI Text Analysis Workshop, London, June 23rd to June 24th, 2011

Several methods have now become popular in political science for scaling latent traits - usually left-right policy positions - from political texts. Following a great deal of development, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as 'Wordscores', 'Wordfish', and other variants (i.e. Monroe and Maeda’s two-dimensional estimates). Less well understood, however, are the appropriate methods for estimating uncertainty around these estimates, which are based on untested assumptions about the stochastic processes that generate text. In this paper we address this gap in our understanding on three fronts. First, we lay out the model assumptions of scaling models and how to generate uncertainty estimates that would be appropriate if all assumptions are correct. Second, we examine a set of real texts to see where and to what extent these assumptions fail. Finally, we introduce a sequence of bootstrap methods to deal with assumption failure and demonstrate their application using a series of simulated and real political texts.