Usain Bolt: It’s Just Not Normal
Usain Bolt‘s wonderful run in the Olympic 200-meter sprint reminds us that the normal distribution — the familiar bell curve beloved by economists and statisticians — can be wildly inappropriate when analyzing extremely selected samples.
This morning’s New York Times shows Usain Bolt’s new world record, relative to the 250 greatest 200-meter sprints ever. Not only does this not look like a normal distribution, it doesn’t even look like the tail of any standard distribution I’ve ever seen:
The full graphic, as a story board, is available here. (It is a beautiful example of using statistics to tell a story.) It should be clear from this chart why few thought that the previous world record would be broken anytime soon. (An interesting aside: This graphic shows that it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint.)
Extreme outliers aren’t that unusual in sports. The greatest outlier may well be Australian cricketer Donald Bradman, whose career batting average of 99.94 puts him so far ahead of any other cricketer that it defies comprehension. (Trivia note: Bradman played the piano at my grandmother’s wedding.) Here is a histogram of career batting averages conditional on being among the top 100 (among those with at least 20 innings):
Some argue that Joe DiMaggio‘s 56-game hitting streak is pretty extraordinary. So I put together a histogram of the great hitting streaks (among those longer than 30). DiMaggio is okay, but he’s no Don Bradman.
The key to all of these strange distributions is that we are focusing on the extreme tails of highly selected samples, where the usual statistical patterns rarely hold. These situations are highly atypical, but equally, incredibly interesting when thinking about the very greatest. (I’ve never understood the urge to call these “black swans,” given that black swans are actually fairly common birds if you know where to look.)
Those interested in how things change in extremely selected samples may enjoy Tim Groseclose‘s paper, “Extreme Sample Selection Bias: Conditions That Cause the Correlation Between Two Variables to Switch Signs.” Groseclose claims that this extreme sample selection can explain why nonmillionaire members of Congress win re-election more often than millionaires; why it shouldn’t be surprising that the greatest golfer is multiracial, even though most top golfers are white; and why high S.A.T.’s may actually predict lower subsequent incomes among those attending elite universities.
Comments