The Dangers of Too Much Data
Wondering whether aspirin will protect your heart or cause internal bleeding? Or whether you should kick your coffee habit or embrace it? It’s often hard to make sense of the conflicting advice that comes out of medical research studies. John Timmer explains that our statistical tools simply haven’t kept up with the massive amounts of data researchers now have access to. In medical (and economic) research, scientists claim a “statistically significant” finding if there’s a less than 5% chance that an observed pattern (between coffee and liver disease, for example) occurred at random. In the new age of data, that rule causes problems: “Even given a low tolerance for error, the sheer number of tests performed ensures that some of them will produce erroneous results at random.” In lay terms, all those new tests you get at the doctor’s office are translated into data sets, which researchers then pore over searching for connections and patterns. And, if you have enough data to examine, eventually you’ll find a statistically significant relationship where no such relationship actually exists — by sheer coincidence. (HT: Matthew Rotkis)[%comments]
Comments