False Positive Science: Why We Can’t Predict the Future

(Photo: momopeche)

This is a guest post from , a professor of environmental studies at the University of Colorado at Boulder. Check out Pielke’s blogs for more on the perils of predicting and “false positive science.”

Sports provide a powerful laboratory for social science research. In fact, they can often be a better place for research than real laboratories because sports provide a controlled setting in which people make frequent, real decisions, allowing for the collection of copious amounts of data. For instance, last summer, Daniel Hamermesh and colleagues used a database of more than 3.5 million pitches thrown in major league baseball games from 2004-2008 to identify biases in umpire, batter, and pitcher decision making. Similarly, Devin Pope and Maurice Schweitzer from the Wharton School used a dataset of 2.5 million putts by PGA golfers over five years to demonstrate loss aversion – golfers made more of the same-length putts when putting for par or worse than for birdie or better.  Such studies tell us something about how we behave and make decisions in settings outside of sports as well.

A paper featured on the Freakonomics blog last week provided another lesson – a cautionary tale about the use of statistics in social science research to make predictions about the future.  The paper, by Dan Johnson of Colorado College and Ayfer Ali, assembled an impressive dataset on Olympic medal performance by nations in the Winter and Summer Games since 1952. Using that data, the paper performed a number of statistical tests to explore relationships with variables such as population, GDP, and even the number of days of frost in a country (to test for the presence of wintry conditions).

The authors found a number of strong correlations between variables, which they called “intuitive,” such as the fact that rich countries win more medals, and nations with snowy winters do better in the Winter games.  But the authors then commit a common social science error by concluding that the high correlations give “surprisingly accurate predictions beyond the historical sample.”  In fact, the correlations performed quite poorly as predictors of medal outcomes, as I showed in an analysis on my blog. In fact, simply taking the results from the previous Olympic Games as a predictor of the following Games provides better predictions than the multivariate regression equation that Johnson and Ali have derived.

What we have here is an illustration of what has more generally been called “false positive science” by Joseph Simmons and colleagues in a 2011 paper. They argue that “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis.”  The fact that a statistically sophisticated model of Olympic medals leads to predictions that perform worse than a naïve prediction based on the results of the immediately previous Games should tell us that there is in fact a lot going on that is not accounted for in the statistical model developed by Johnson and Ali. Does such a poorly-performing statistical model provide much insight beyond “intuition”? I’m not so sure.

More generally, while anyone can offer a prediction of the future, providing a prediction that improves upon a naïve expectation is far more difficult. Whether it is your mutual fund manager who is seeking to outperform an index fund, or a weather forecaster trying to beat climatology, we should judge forecasts by their ability to improve upon simple expectations. If we can’t beat simple expectations in the controlled environment of forecasting the outcomes of a sporting event, we should have some considerable degree of skepticism when interpreting predictions related to the far more complex settings of the economy and human behavior more generally.

Roger Pielke Jr. is a professor of environmental studies at the University of Colorado where he studies science, technology and decision making. Lately, he has been studying the governance of sport. His most recent book is The Climate Fix.


Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.



View All Comments »
  1. Nosybear says:

    The following quote: “They argue that “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis,” literally gave me a laugh-out-loud moment. This is the absurdity of politics, business, science, medicine, any “predictive” science. Given a large enough sample size, you will show statistical significance for practically insignificant differences. Any two variables moving either in the same direction (positive) or the opposite direction (negative) are correlated. The confidence interval decreases as sample size increases. So all you have to do to get significance is increase the number of observations. And this is junk science, parroted by innumerate journalists. Thanks for this post, Freakonomics!

    Well-loved. Like or Dislike: Thumb up 15 Thumb down 1
  2. James says:

    Once again, we’re subjected to this nonsense about not being able to predict the future. That it IS nonsense is shown by the fact that we regularly DO predict the future, often with quite astounding precision. Tide tables, solar & lunar eclipses, the space probes launched on multi-year missions by NASA and ESA, and much, much more – all predictions of the future, accurate to many decimal places.

    Of course our ability to make predictions depends on the nature of the underlying physics – does it become “chaotic” at some point? – and the accuracy with which we know the initial conditions. Take baseball, for instance. We know that when the ball leaves the pitcher’s hand, it is extremely likely to pass somewhere close to the plate. If we used some laser sensors to measure this initial position & velocity, we could use a fast computer to predict the ball’s trajectory with great precision.

    Human senses & reaction aren’t that accurate, which limits our ability to accurately predict the outcome of the pitch, but this loss of accuracy is a function of our limited knowledge, not any fundamental inability to make predictions. We can even quantify those limits, and accurately predict probablities of various possible outcomes.

    The key here lies in the difference between accepting the false claim that we can’t predict the future, and understanding the many factors which place limits on just how accurate we can expect those predictions to be.

    Hot debate. What do you think? Thumb up 15 Thumb down 17
    • Dwight says:

      I think you missed the point here. What Pielke is cautioning us about is using large, multivariate statistical models and assigning too much predictive value to them. The examples you use are of either recurring events which follow a well-defined pattern such as the tides, or simple calculus in spaceflight. These are not models.

      Well-loved. Like or Dislike: Thumb up 21 Thumb down 0
      • James says:

        Hidden due to low comment rating. Click here to see.

        Disliked! Like or Dislike: Thumb up 2 Thumb down 15
      • Josh says:

        When you’re constructing a title, you seek to balance accuracy in indication of content with desie to foment interest in the article. The title above beats out “Lots of Variables Affect Predictive Ability”. While this may not be a pleasant reality, it is a basic rule of writing.

        Well-loved. Like or Dislike: Thumb up 7 Thumb down 0
    • Josh S says:

      If you think Pielke was talking about predicting things with simple mechanics, you need to read the article again.

      Thumb up 0 Thumb down 0
  3. brian t says:

    Well, New Gingrich is a huge fan of Isaac Asimov’s “Foundation” series, which is about this very topic: the long-term prediction of the future based on the behaviour of masses of people. Statistical analysis can become more accurate as your population size grows, and so the theory of “psychohistory” is that it’s possible to predict the future of the whole galaxy.

    Thumb up 2 Thumb down 1
  4. Eric M. Jones. says:

    Common humor:

    Short man says to tall man, “You must be good at basketball…”

    Tall man (who has never played basketball) says, “…And YOU must be good at miniature golf!”

    Well-loved. Like or Dislike: Thumb up 7 Thumb down 0
  5. Roy says:

    My favorite description of this is captured in David Freedman’s absolutely succinct title, “The Truth Wears Off”.

    Thumb up 0 Thumb down 0
  6. gkm says:

    Pielke is looking at an ARIMA model to contrast the validity of the Johnson-Ali analysis. Were Johnson-Ali trying to forecast or were they trying to establish causation? If the latter, I don’t think it’s fair to completely trash their work for a simple slip-up by them in overstating the point of their analysis.

    Johnson-Ali may be making the same old mistakes. However, at least their analysis could be explanatory which can be useful. This is not true with the ARIMA model.

    If I want to understand why something is happening and why something could take the wheels off, I’m certainly not going to rely on ARIMA. If I want to place a bet based on the most likely outcome, then ARIMA je t’adore.

    The two models should be tested on an increasing set of data starting with the greatest variability in medals over time to determine if they are comparable in their predictive power of the outliers. How big must the data set become before there is a significant difference or possibly convergence in the predictive power of the models? If Johnson-Ali performs on the outliers, can you really conclude that there is nothing of merit there?

    Thumb up 1 Thumb down 0
  7. Andrew Richards says:

    Good article: dreadful headline. The article says nothing about not being able to make accurate forecasts – which I do for a living!

    Thumb up 1 Thumb down 0