Search the Site

Prediction Markets vs. Super Crunching: Which Can Better Predict How Justice Kennedy Will Vote?

One of the great unresolved questions of predictive analytics is trying to figure out when prediction markets will produce better predictions than good old-fashion mining of historic data. I think that there is fairly good evidence that either approach tends to beat the statistically unaided predictions of traditional experts.
But what is still unknown is whether prediction markets dominate statistical prediction. (Freakonomics co-blogger Justin Wolfers, in a sense, is on both sides of this debate. Justin is one of the best crunchers of historic data, and even more, he is at the cutting edge of exploiting the results of prediction markets).
Thanks to Josh Blackman, we are about to have a test of these two competing approaches. Blackman has organized a cool Supreme Court fantasy league, where anybody can make predictions about how Supreme Court justices will vote on particular cases. The aggregate prediction of the league members is powerful “wisdom of the crowds” information.
And it is natural to ask whether the predictions of the league are more accurate than the predictions of a statistical algorithm developed by Andrew D. Martin, Kevin M. Quinn, Theodore W. Ruger, and Pauline T. Kim. I wrote about their study in my book Super Crunchers (you can read the excerpt about the study from the Financial Times here).
On a “Prediction Tools” website, I even created a JAVA applet based on the study where you can generate your own predictions of how Justice Kennedy will vote:


That’s right. For a particular case before the court, just plug in answers for the six questions (such as “the ideological direction of lower court decision”) and the applet will predict whether Kennedy will affirm or reverse the lower court opinion.
Looking at four cases before the current court, Josh has compared the statistical predictions of the applet to the initial aggregate predictions from his fantasy league:

The first case we consider is Maryland v. Shatzer, which considers whether or not police are barred from questioning a criminal suspect who has invoked their right to counsel when the interrogation takes place nearly three years later. … The second case we consider is U.S. v. Stevens, which considers whether a statute banning depictions of animal cruelty is facially invalid under the Free Speech Clause of the First Amendment. … The third case we consider is Bloate v. U.S., which considers whether additional time granted at the request of a defendant to prepare pretrial is excludable from the time within which trial must commence under the Speedy Trial Act. … The fourth case we consider is Salazar v. Buono, which considers whether an individual has Article III standing to bring an Establishment Clause suit challenging the display of a religious symbol on government land and if an Act of Congress directing the land be transferred to a private entity is a permissible accommodation.
How do the members think Justice Kennedy will vote? Predictions of the 10th Justice after the jump:
In Maryland v. Shatzer, 43 percent (123 out of 267 voting members) agreed with the program, and predicted that Justice Kennedy would vote to affirm the Lower Court.
In U.S. v. Stevens, 83 percent (168 out of 201 voting members) agreed with the program, and predicted that Justice Kennedy would affirm the Third Circuit. While predictions for Maryland v. Shatzer produced weaker results, a stronger agreement in this situation may indicate that certain criteria are clearer predictors of behavior and observers of the court pick up on them much more easily.
In Bloate v. U.S., 76 percent (61 out of 80 voting members) agreed with the program, and predicted that Justice Kennedy would affirm the Eight Circuit.
In Salazar v. Buono, only 45 percent (48 out of 106 voting members) agreed with the program, and predicted that Justice Kennedy would affirm the Ninth Circuit. While the difference between the two predictors is murky, FantasySCOTUS predictions are much more flexible since they are not subject to the “category” constraints the program uses and would probably be the more accurate indicator in this situation.

Thanks to Josh’s creation, we’ll be able to sit back — paying particular attention to instances of disagreement — and see over time which approach makes the better predictions. This single experiment will not, by itself, resolve the larger “which is better” debate — in part, because I could imagine putting forward stronger market-based and statistical-based predictions. The fantasy league predictions would probably be more accurate if market participants had to actually put their money behind their predictions (as with And the statistical predictions could probably be improved if they relied on more recent data and controlled for more variables.
But we are bound to see more meta-methodological comparisons like these in the years to come — which will also shed light on whether market participants will learn to efficiently incorporate the results of statistical prediction into their own assessments. At the moment, individual decision-makers tend to improve their prediction when given statistical aids; but they still tend to wave off the statistical prediction too often.