Breaking Down the Clemens Report: A Guest Post

Sports fans will probably be aware that Roger Clemens is currently before Congress, arguing that the Mitchell Report wrongly tagged him as having used performance-enhancing drugs. And last week, his agents released the “Clemens Report,” arguing that his career statistics somehow exonerate him. The full marketing spin is available here.

I was interested in understanding how they could “prove” his innocence by crunching numbers, and in an effort to make sense of it all, I sat down in a Wharton conference room with three fellow data hounds – Eric Bradlow, Shane Jensen, and Adi Wyner. With two stats professors and a stats/marketing prof in the room, I felt a bit outmatched. But it sure was fun to work through the issues together.

The main argument in the Clemens Report is that there is nothing unusual in his career numbers, and, in fact, his performance is quite similar to that of Nolan Ryan. But we note the following:

[S]uch comparisons tell an incomplete story. By comparing Clemens only to those who were successful in the second act of their careers, rather than to all pitchers who had a similarly successful first act, the report artificially minimizes the chances that Clemens will look unusual.

There’s a pretty neat trick at work here: if you compare Clemens only to those who had a terrific last decade of their careers, then the last decade of Clemens’ career doesn’t look that unusual. To sidestep this, we suggest that “[a] better approach to this problem involves comparing the career trajectories of all highly durable starting pitchers.”

So we put together data on all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons and have pitched at least 3,000 innings. This broader comparison group yields some pretty different conclusions than the Clemens v. Ryan contrasts.

A picture is worth a thousand words, and here we show simple quadratic fits to the data for Clemens v. controls:

roger clemens report

The Clemens Report is also notable for its near-exclusive focus on his ERA. Now, any Sabermetrician will tell you that this is not a particularly reliable statistic, and that it bounces around a lot more than a pitcher’s true performance. This is a problem because noisy data can obscure an underlying pattern. So we supplemented our analysis by examining a range of alternative indicators, including walks and hits per inning pitched (see right panel, above).

We conclude that “the available data on Clemens’s career strongly hint that some unusual factors may have been at play in producing his excellent late-career statistics.”

To be clear, we don’t know whether Roger Clemens took steroids or not. But to argue that somehow the statistical record proves that he didn’t is simply dishonest, incompetent, or both. If anything, the very same data presented in the report – if analyzed properly – tends to suggest an unusual reversal of fortune for Clemens at around age 36 or 37, which is when the Mitchell Report suggests that, well, something funny was going on.

You can read our full analysis in today’s Times, here.

UPDATE: Roger Clemens’ crisis management consultants have just released a rejoinder to our analysis, available here. Further coverage: Lester Munson at, a less flattering analysis at, and another piece at (leading to hundreds of comments).

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.



View All Comments »
  1. Cyril Morong says:

    They don’t seem to adjust for league averages or park effects. They also mention that ERA is not a good stat since it is affected by fielders. But the two graphs show his career trajectory in ERA and walks + hits per 9 IP, both affected by fielders. Sabermetricians for several years have been using an ERA based only on defense independent stats (DIPS ERA).

    They show that Clemens has done better than a typical career trajectory would predict. But there are always going to be atypical pitchers. If he exceeded the normal career trajectory by alot more than anyone else ever has (as I think is the case for Bonds) then it would mean something. They don’t show what other pitchers did. Warren Spahn kept pitching great )in the late 1950s and early 1960s) past the age of 40, presumably in an era before PEDs.

    JC Bradbury, author of the book “The Baseball Economist,” has a different analysis

    Thumb up 0 Thumb down 0
  2. Bill in Denver says:

    I’m raising my eyebrows as much as anyone over Clemens’ late career surge, but I have to take exception with the vague characterization that something unusual was at play. In this context, the implication is clearly that Clemens was up to something unsavoury. Keep in mind however, that the comparison pitchers used in the analysis compromised only 30 data points spread out over 40 years of baseball history. During the time in Clemens’ career where he appears to have improved his stats unusually, what was happening in the rest of baseball? This period coincides, for example, with the new MLB drug testing poicy. Were all pitching stats improved over this time as power hitting numbers declined? I’d be more concerned for Clemens’ stats if he dropped sharply again as we have seen with Bonds, and others who clearly benefited from drug use.

    Thumb up 0 Thumb down 0
  3. Michael says:

    Bill and Cyril,

    The point is not that the data shows that he took (or didn’t take) drugs–that would be completely impossible. It’s that the data presented by the Clemens defense team is misleading. His incredible performance late in his career is quite atypical, not par for the course as Clemens’ reports apparently suggests (I have not seen their analysis, I’m just going by what Justin Wolfers claims). Whether or not those stats are possible for someone who did not use drugs is completely irrelevant to this article.

    Thumb up 0 Thumb down 0
  4. Jim Strathmeyer says:

    “And last week, his agents released the “Clemens Report,” arguing that his career statistics somehow exonerate him.”

    “To be clear, we don’t know whether Roger Clemens took steroids or not.”

    Don’t you guys get it? The fact that you can’t tell wether he took steroids or not exonerates him!

    What on earth is all of this number crunching supposed to go? I can see it gets much farther than I could’ve imagined, but in the end doesn’t it just accuse the best players of being steroid user. The Clemmens-charts sure look out of the ordinary, but they don’t really prove anything.

    Thumb up 0 Thumb down 0
  5. Alex says:

    I agree with Bill that the implication is definitely anti-Clemens. It was my assumption that Clemens’ analysis was to show that he wasn’t a complete outlier. If Nolan Ryan, Johnson, or Schilling were visiting Congress, would your report have said that there was unusual activity in their late careers? To my knowledge, no one has accused any of them of using PEDs.

    Thumb up 0 Thumb down 0
  6. Dan says:

    This analysis doesn’t seem all that informative. Clemens has late career numbers that are better than the average of the other 31 pitchers, but in order to decide whether his performance is “unusual” among this group of 32 highly durable pitchers you have to look at the range of performances of the other 31 pitchers, not just the central tendency.

    For instance, if you can summarize each pitcher’s career trajectory with a single number, with (say) lower numbers representing better late-career performance, then you could rank the 32 pitchers and see where Clemens falls on that list. If that measure has him 1st in late-career performance by a wide margin, then that would be a very unusual performance. If he’s 9th out of 32, and close to several other pitchers on both sides, then not so much.

    Thumb up 0 Thumb down 0
  7. paul says:

    I’m just saying:

    People seem much more willing to jump to Clemen’s defense than, say, Barry Bonds.


    Thumb up 0 Thumb down 0
  8. Mike P. says:

    If you look at Clemens’ career you will see 6 unusual seasons – 1993, 1995, 1996, and 2004-2006. The first three were three of the last four of his years in Boston and were unusual because he was average instead of great. Ask any Boston fan about those years and they will say he was lazy, out of shape, and unmotivated – potentially due to him fighting with management. He then went to Toronto and had 2 great years at ages 34 and 35. These seasons were seen as a return to the Clemens of old and can’t realistically be called unusual. Again, ask a Bostonian and you will hear that he was re-energized in Toronto and started working again. He then had 5 average years in New York, consistent with an aging pitcher. The eyebrows should only be raised about his 3 stellar years in Houston (2004-2006) at ages 41-43.
    As to the analysis above and the Hendricks’ analysis – they, of course, prove nothing. A cursory glance reveals that Clemens’ was better after 35 than most, but he was also better before 32 than most. So comparing him to all pitchers with a lengthy record in valid, but comparing him to pitchers with elite early records similar to his is not invalid.
    Really, the only seasons that seem unusual are his 2004-2006 (age 41-43) in Houston. As far as I can tell the only other pitcher, post Babe Ruth, with that level of success at 40 or over is Randy Johnson’s age 40, 2004 season.

    Thumb up 0 Thumb down 0