False Positive Science: Why We Can't Predict the Future

(Photo: momopeche)

This is a guest post from , a professor of environmental studies at the University of Colorado at Boulder. Check out Pielke’s blogs for more on the perils of predicting and “false positive science.”

Sports provide a powerful laboratory for social science research. In fact, they can often be a better place for research than real laboratories because sports provide a controlled setting in which people make frequent, real decisions, allowing for the collection of copious amounts of data. For instance, last summer, Daniel Hamermesh and colleagues used a database of more than 3.5 million pitches thrown in major league baseball games from 2004-2008 to identify biases in umpire, batter, and pitcher decision making. Similarly, Devin Pope and Maurice Schweitzer from the Wharton School used a dataset of 2.5 million putts by PGA golfers over five years to demonstrate loss aversion – golfers made more of the same-length putts when putting for par or worse than for birdie or better.  Such studies tell us something about how we behave and make decisions in settings outside of sports as well.

A paper featured on the Freakonomics blog last week provided another lesson – a cautionary tale about the use of statistics in social science research to make predictions about the future.  The paper, by Dan Johnson of Colorado College and Ayfer Ali, assembled an impressive dataset on Olympic medal performance by nations in the Winter and Summer Games since 1952. Using that data, the paper performed a number of statistical tests to explore relationships with variables such as population, GDP, and even the number of days of frost in a country (to test for the presence of wintry conditions).

The authors found a number of strong correlations between variables, which they called “intuitive,” such as the fact that rich countries win more medals, and nations with snowy winters do better in the Winter games.  But the authors then commit a common social science error by concluding that the high correlations give “surprisingly accurate predictions beyond the historical sample.”  In fact, the correlations performed quite poorly as predictors of medal outcomes, as I showed in an analysis on my blog. In fact, simply taking the results from the previous Olympic Games as a predictor of the following Games provides better predictions than the multivariate regression equation that Johnson and Ali have derived.

What we have here is an illustration of what has more generally been called “false positive science” by Joseph Simmons and colleagues in a 2011 paper. They argue that “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis.”  The fact that a statistically sophisticated model of Olympic medals leads to predictions that perform worse than a naïve prediction based on the results of the immediately previous Games should tell us that there is in fact a lot going on that is not accounted for in the statistical model developed by Johnson and Ali. Does such a poorly-performing statistical model provide much insight beyond “intuition”? I’m not so sure.

More generally, while anyone can offer a prediction of the future, providing a prediction that improves upon a naïve expectation is far more difficult. Whether it is your mutual fund manager who is seeking to outperform an index fund, or a weather forecaster trying to beat climatology, we should judge forecasts by their ability to improve upon simple expectations. If we can’t beat simple expectations in the controlled environment of forecasting the outcomes of a sporting event, we should have some considerable degree of skepticism when interpreting predictions related to the far more complex settings of the economy and human behavior more generally.

Roger Pielke Jr. is a professor of environmental studies at the University of Colorado where he studies science, technology and decision making. Lately, he has been studying the governance of sport. His most recent book is The Climate Fix.


The following quote: "They argue that “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis," literally gave me a laugh-out-loud moment. This is the absurdity of politics, business, science, medicine, any "predictive" science. Given a large enough sample size, you will show statistical significance for practically insignificant differences. Any two variables moving either in the same direction (positive) or the opposite direction (negative) are correlated. The confidence interval decreases as sample size increases. So all you have to do to get significance is increase the number of observations. And this is junk science, parroted by innumerate journalists. Thanks for this post, Freakonomics!


Once again, we're subjected to this nonsense about not being able to predict the future. That it IS nonsense is shown by the fact that we regularly DO predict the future, often with quite astounding precision. Tide tables, solar & lunar eclipses, the space probes launched on multi-year missions by NASA and ESA, and much, much more - all predictions of the future, accurate to many decimal places.

Of course our ability to make predictions depends on the nature of the underlying physics - does it become "chaotic" at some point? - and the accuracy with which we know the initial conditions. Take baseball, for instance. We know that when the ball leaves the pitcher's hand, it is extremely likely to pass somewhere close to the plate. If we used some laser sensors to measure this initial position & velocity, we could use a fast computer to predict the ball's trajectory with great precision.

Human senses & reaction aren't that accurate, which limits our ability to accurately predict the outcome of the pitch, but this loss of accuracy is a function of our limited knowledge, not any fundamental inability to make predictions. We can even quantify those limits, and accurately predict probablities of various possible outcomes.

The key here lies in the difference between accepting the false claim that we can't predict the future, and understanding the many factors which place limits on just how accurate we can expect those predictions to be.



I think you missed the point here. What Pielke is cautioning us about is using large, multivariate statistical models and assigning too much predictive value to them. The examples you use are of either recurring events which follow a well-defined pattern such as the tides, or simple calculus in spaceflight. These are not models.


If that is what he intends to say, then he should change the title. If all he's claiming is that statistics often aren't much use in predicting the future, I'd agree. I'd also agree if he said that a screwdriver isn't much use for driving nails. That doesn't mean that nails can't be driven, or that a screwdriver is totally useless, it just means that you need to pick the right tool for the job.

brian t

Well, New Gingrich is a huge fan of Isaac Asimov's "Foundation" series, which is about this very topic: the long-term prediction of the future based on the behaviour of masses of people. Statistical analysis can become more accurate as your population size grows, and so the theory of "psychohistory" is that it's possible to predict the future of the whole galaxy.

Eric M. Jones.

Common humor:

Short man says to tall man, "You must be good at basketball..."

Tall man (who has never played basketball) says, "...And YOU must be good at miniature golf!"


My favorite description of this is captured in David Freedman's absolutely succinct title, "The Truth Wears Off".


How about when the warmists (in 2005) predicted 50 million climate refugees by 2010 . . . from specific locations, map included:


But the opposite happened . . . those areas GREW in population.



Mike Lorrey

Ya and James Hansen predicted that Manhattan would be under 15 feet of water by 2012 back in the 90's....


Pielke is looking at an ARIMA model to contrast the validity of the Johnson-Ali analysis. Were Johnson-Ali trying to forecast or were they trying to establish causation? If the latter, I don't think it's fair to completely trash their work for a simple slip-up by them in overstating the point of their analysis.

Johnson-Ali may be making the same old mistakes. However, at least their analysis could be explanatory which can be useful. This is not true with the ARIMA model.

If I want to understand why something is happening and why something could take the wheels off, I'm certainly not going to rely on ARIMA. If I want to place a bet based on the most likely outcome, then ARIMA je t'adore.

The two models should be tested on an increasing set of data starting with the greatest variability in medals over time to determine if they are comparable in their predictive power of the outliers. How big must the data set become before there is a significant difference or possibly convergence in the predictive power of the models? If Johnson-Ali performs on the outliers, can you really conclude that there is nothing of merit there?


Andrew Richards

Good article: dreadful headline. The article says nothing about not being able to make accurate forecasts - which I do for a living!