Kai Jäger
Not a New Gold Standard: Even Big Data Cannot Predict the Future

Critical Review, 2016: 28, issue 3-4, pp. 335-355
ISSN: 0891-3811 (print); 1933-8007 (online)

Many scholars believe that the proliferation of large-scale datasets will spur scientific advancement and help us to predict the future using sophisticated statistical techniques. Indeed, a team of researchers achieved astonishing success using the world’s largest event dataset, produced by the icews project, to predict complex social outcomes such as civil wars and irregular government turnovers. However, the secret of their success lay in transforming epistemically difficult questions into easy ones. Forecasting the onset of civil wars becomes an easy task if one relies on explanatory variables that measure how often newspapers report on tensions, fights, or killings. But news reports on prewar conflicts are just variations of the variable that researchers want to predict; the finding that more conflicts are likely to occur when journalists report about conflicts carries little scientific value. A similar success rate in “predicting” interstate wars can also be achieved by a simple Google News search for country names and conflict-related news shortly just before a conflict is coded as a war. Big data can help researchers to make predictions in simple situations, but there is no evidence that predictions will also succeed in uncertain environments with complex outcomes—such as those characteristic of politics.