Sports fans will probably be aware that Roger Clemens is currently before Congress, arguing that the Mitchell Report wrongly tagged him as having used performance-enhancing drugs. And last week, his agents released the “Clemens Report,” arguing that his career statistics somehow exonerate him. The full marketing spin is available here.
I was interested in understanding how they could “prove” his innocence by crunching numbers, and in an effort to make sense of it all, I sat down in a Wharton conference room with three fellow data hounds – Eric Bradlow, Shane Jensen, and Adi Wyner. With two stats professors and a stats/marketing prof in the room, I felt a bit outmatched. But it sure was fun to work through the issues together.
The main argument in the Clemens Report is that there is nothing unusual in his career numbers, and, in fact, his performance is quite similar to that of Nolan Ryan. But we note the following:
[S]uch comparisons tell an incomplete story. By comparing Clemens only to those who were successful in the second act of their careers, rather than to all pitchers who had a similarly successful first act, the report artificially minimizes the chances that Clemens will look unusual.
There’s a pretty neat trick at work here: if you compare Clemens only to those who had a terrific last decade of their careers, then the last decade of Clemens’ career doesn’t look that unusual. To sidestep this, we suggest that “[a] better approach to this problem involves comparing the career trajectories of all highly durable starting pitchers.”
So we put together data on all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons and have pitched at least 3,000 innings. This broader comparison group yields some pretty different conclusions than the Clemens v. Ryan contrasts.
A picture is worth a thousand words, and here we show simple quadratic fits to the data for Clemens v. controls:
The Clemens Report is also notable for its near-exclusive focus on his ERA. Now, any Sabermetrician will tell you that this is not a particularly reliable statistic, and that it bounces around a lot more than a pitcher’s true performance. This is a problem because noisy data can obscure an underlying pattern. So we supplemented our analysis by examining a range of alternative indicators, including walks and hits per inning pitched (see right panel, above).
We conclude that “the available data on Clemens’s career strongly hint that some unusual factors may have been at play in producing his excellent late-career statistics.”
To be clear, we don’t know whether Roger Clemens took steroids or not. But to argue that somehow the statistical record proves that he didn’t is simply dishonest, incompetent, or both. If anything, the very same data presented in the report – if analyzed properly – tends to suggest an unusual reversal of fortune for Clemens at around age 36 or 37, which is when the Mitchell Report suggests that, well, something funny was going on.
You can read our full analysis in today’s Times, here.
UPDATE: Roger Clemens’ crisis management consultants have just released a rejoinder to our analysis, available here. Further coverage: Lester Munson at ESPN.com, a less flattering analysis at MLB.com, and another piece at ESPN.com (leading to hundreds of comments).