Usain Bolt: It’s Just Not Normal

Usain Bolt‘s wonderful run in the Olympic 200-meter sprint reminds us that the normal distribution — the familiar bell curve beloved by economists and statisticians — can be wildly inappropriate when analyzing extremely selected samples.

This morning’s New York Times shows Usain Bolt’s new world record, relative to the 250 greatest 200-meter sprints ever. Not only does this not look like a normal distribution, it doesn’t even look like the tail of any standard distribution I’ve ever seen:

INSERT DESCRIPTION

The full graphic, as a story board, is available here. (It is a beautiful example of using statistics to tell a story.) It should be clear from this chart why few thought that the previous world record would be broken anytime soon. (An interesting aside: This graphic shows that it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint.)

Extreme outliers aren’t that unusual in sports. The greatest outlier may well be Australian cricketer Donald Bradman, whose career batting average of 99.94 puts him so far ahead of any other cricketer that it defies comprehension. (Trivia note: Bradman played the piano at my grandmother’s wedding.) Here is a histogram of career batting averages conditional on being among the top 100 (among those with at least 20 innings):

INSERT DESCRIPTION

Some argue that Joe DiMaggio‘s 56-game hitting streak is pretty extraordinary. So I put together a histogram of the great hitting streaks (among those longer than 30). DiMaggio is okay, but he’s no Don Bradman.

INSERT DESCRIPTION

The key to all of these strange distributions is that we are focusing on the extreme tails of highly selected samples, where the usual statistical patterns rarely hold. These situations are highly atypical, but equally, incredibly interesting when thinking about the very greatest. (I’ve never understood the urge to call these “black swans,” given that black swans are actually fairly common birds if you know where to look.)

Those interested in how things change in extremely selected samples may enjoy Tim Groseclose‘s paper, “Extreme Sample Selection Bias: Conditions That Cause the Correlation Between Two Variables to Switch Signs.” Groseclose claims that this extreme sample selection can explain why nonmillionaire members of Congress win re-election more often than millionaires; why it shouldn’t be surprising that the greatest golfer is multiracial, even though most top golfers are white; and why high S.A.T.’s may actually predict lower subsequent incomes among those attending elite universities.


Mountford

An interesting end to Bradmans career. He required only 4 runs in his final test match to achieve a test average of 100. He was out for 0 second ball. Considering his average, he was more likely to score a century than go out for a duck but he put the stats on their head right to the very last.

Andrew

Does anyone think that the extreme sample selection bias might explain why so many organisations end up being run by crap people?

Bryce

Mike (#20 & #32)-
All of what I said IS true, actually. Thank you to Barricade for providing a bit of evidence.

The "deceleration phase" is not something that the sprinters try to do, it is just a inevitable part of every sprint. No matter how hard you push yourself in a sprint, even in the 100m, by the end you WILL be decelerating. Fact of life.

I have no idea if Bolt is in fact a better 400m runner. I was going by what the announcer said, which could or could not have been correct. But it wouldn't surprise me, especially since Michael Johnson was then and continued to be a 400m runner and he is the only other sprinter to post a 200m time that low. And don't tell me Michael Johnson wasn't a 400m runner. He didn't break the WR in Atlanta, but did in 1999 - and it still holds....
http://en.wikipedia.org/wiki/400_metres
...And he now coaches Jeremy Wariner in the 400m - not the 200m - the 400m. Why? Because the 400m was his specialty. The 200m WR was a byproduct.

Read more...

Andrea

This is a bit off topic, but Barack Obama considers himself black, even if he does have a white mother. If he says it, then he is.

Maria

Just wanted to point out that the graph that seems to suggest that

"it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint."

depicts records, not average times. So it could be that the typical pace in a 200m sprint is higher than in a 100m, but that the likelihood of an extraordinary race is higher when you're running 100m than 200m. Or that it used to be.

Garry Kanter

Statistical outliers? Dont forget long jumper Bob Beamon. Broke the record by over 21 inches with his 1968 Mexico City jump of 29 ft, 2 and 1/2 inches.

Mike

On a tangent, if the 200m now has the fastest average speed, why isn't the winner of this event now called "World's Fastest Man/Woman"?

Sherman Sims

I'm not buying the comparison of the the sprint graph outliers to the graphs of the batting outliers. The batting outliers are more of a series of the unlikely events and are influenced by many factors. A comparison to home run distance would be more appropriate and shows a direct correlation to improvements in training, human size and possibly steriods.

DaveV

Usain's times were 9.69 in 100m and 19.32 in the 200m. Dividing 19.32 by 2 would translate to 9.66 seconds in the 100m, which Usain could easily have beaten, if he had finished the 100m running as hard as he finished the 200m.

Another question: How does a country as small as Jamaica always produce so many world-class sprinters every Olympics.

Alan

I wonder: if Bolt and, say, Veroncia Campbell-Brown had a hypothetical statistically significant number of children, what would the mean and standard deviation of their sprint time distribution look like? Would it be centered around their parents' sprint times, or more broadly spread out?

Brian

Dave #40

That is circumstantial evidence at best. They have not been linked to any steroids, have never failed a test, and have nothing else to make them suspect except to excel. Does this guarantee innocence? Of course not, as we ultimately found out with Marion Jones. But that was only after HARD evidence became available. Until there is such for Bolt/Johnson, we should celebrate their excellence, not question it.

Carl #42
I agree whole-heartedly. And, yes, I realize that Tiger is multi-racial, but I was simply using the language in the study, probably thick-headedly. You are right that the "one drop rule" still seems to exist, especially when convenient, and I disagree with that entirely. However, we should be careful to avoid the pseudo-science that people of mixed-race are somehow "the best of both worlds" thought process, because it is incredibly flawed. Perhaps you meant that their experiences and the resilence they had to develop to get where they are is the deciding factor in their success, and that would be interesting to study. But it is a very slipper-slope as soon as we start trying to ascribe any individual traits to one's race or ethnicity.

As that Racialicious link I provided showed (which many clearly did not read), the idea that racial or ethnic groups are somehow genetically enhanced to perform in certain areas are HORRIBLY flawed scientifically and otherwise.

Read more...

Michael Casp

Assuming no performance enhancing drugs, I submit that this is the result of three things:

A) Smarter training and diet

B) Evolution

C) Higher global standards of living that allow more people to participate in competitive sport

Brian

I struggle to see how the referenced study shows how Tiger Woods is better because he's black? FTW? Can someone explain that to me? It is a highly inflammatory claim to make and one that should be supported heavily with hard science, if it can be supported at all. And I did not see that in the study.

Also, there is absolutely no reason to believe that Bolt or MJ were on steroids and to make baseless accusations is ridiculous.

Mike

Bryce, you're incorrect in a lot of what you wrote. While I have no idea whether Bolt might be better at the 400m, it was simply a case of a clueless announcer getting caught up in the moment. In addition... Michael Johnson was not a 400m runner. I don't care what he calls himself, what you call him, or what anyone else calls him. He won the 400m race, but absolutely obliterated the 200m record.

As for sustaining one's speed throughout the 200m... it's a sprint. Lung capacity that 400m runners have is a non-issue. Even down at the high school level, athletes have the basic fitness required to maintain their speed throughout a 200m race.

Please enlighten us and show us the "deceleration phase" of the 200m race.

Mike

OK you said it twice, so it MUST be true. Thanks for clarifying that it's a fact of life. I don't think you can call someone's second-best event their "specialty". Was Happy Gilmore a hockey player or a golfer?

How'd Wariner do, by the way?

Mitch

May the best "Juicer" win.

Valpey

A comment on the observance of extreme outlier incidence in sports: I imagine one of the biggest reasons sports outlier phenomenon is different from so many other distributions is the extremely strong positive recursive effect being elite has on being even more elite. This recursion is in effect over a sustained period of time beginning with a child demonstrating an aptitude in a certain discipline. But unlike say, academics, where institutions are working with mandates to allocate resources to educate all, the 'Athletic Industrial Complex' is constantly seeking to invest in the very best.

To take one of the many feedback loops, consider the total income of an elite athlete from salary, prizes, and endorsements. Compare the income or value of an athlete who is #1 in his/her discipline to those in the top ten and we see that being #1 is strongly rewarded.

Applying this potential income or value relative to, in a case like Bolt's Jamaica, a poor economy we see the feedback incentives to be astronomical.

Also, one thing that this extreme outlier study shows is that it would actually not be surprising if the fastest man on earth was not using performance enhancements.

Read more...

WholeMealOfFood

A more appropriate way to view sprinting records is how the records come down over time, rather than viewing the distribution of an arbitrary sampling of results taken over a long time period. The "History of World Records" graphics that they have show this best.

The regularly seen pattern for race records is roughly some sort of exponential curve where times comes down quickly and fall more slowly as time progresses (although it's more apparent in the swimming records than running - but this just indicates that the running records reached maturity earlier). Unusual performances will stand out as large residuals from the fitted curve.

Bryce

Two things about the sprinting...

It was said on the olympics broadcast during the 200m final that Bolt's best event may be the 400m, even though he does not compete in it. This leads me to the fact that Michael Johnson was a 400m runner. In sprinting, the trick is not who gets to the highest speed, but WHO SUSTAINS THEIR TOP SPEED THE LONGEST, so a great 400m runner, should be a lot better at the 200m than a great 100m runner, because the 400m runner trains for a longer race, and should be stronger and more able to sustain their top speed for longer. Unfortunately because of the physical exertion in both the 200m and 400m, not many 400m runners run the 200m (also they are usually competed in with less time between at meets), which is why Michael Johnson's 400m & 200m golds were so amazing and had never been done before. This is also why it would probably be hard to compare 200m times of 400m runners vs 200m times of 100m runners.

Which brings me to my second point. The acceleration phase of a 100m sprint is about 40m. Then there is top speed and the rest is inevitably deceleration. The same goes for the 200m. Sprinters will accelerate for about 40m in the 200m and then, the stonger sprinters will hold their top speed longer and decelerate at a slower rate. The acceleration zone is the same length, but it is a smaller percentage of 200m than 100m, so the overall average speed is less affected by it in the 200m. Also, the second half of the 200m is always much faster than the fastest 100m sprint because in the 2nd half of the 200m doesn't have any acceleration zone... it get's a running start.

Read more...

Eric

I think this has more to do with pure chance that any kind of quantifiable statistic. 0.3 seconds is such a short period of time that many factors could contribute to it. This is exactly the reason more than a 5 mph tailwind disqualifies a run from the record books. All of the runners in that one race benefit, but it would be unfair to compare them with runners going into a 5 mph headwind. This one time, everything that could go one way or the other for Bolt happened to go the right way. To get to this point (qualifying for the Olympics), all of the training and work made the difference, but running with 8 people all within a half of a second of each other, it is all the little things that make the difference. My own experience as a high school sprinter tells me that I had about a one-half second range of time for the 100 and about a second and a half for the 200. I am sure that range is smaller for an elite sprinter who controls every body movement through years of training, but it still exists.

Read more...