Analyzing Roger Clemens: A Step-by-Step Guide

Yesterday, I posted about the conclusions that Eric Bradlow, Shane Jensen, Adi Wyner, and I drew from analyzing Roger Clemens‘s career statistics. I thought that it might be useful to show how we got from the findings in the Clemens Report (exonerating him), to our somewhat opposite conclusions. So for budding forensic economists, here is a step-by-step guide, with pictures.

1. The Raw Data

The “Clemens Report” mainly analyzes his earned run average through time. These numbers appear to show no reliable pattern, as they bounce around a lot from season-to-season. At this point, it is hard to see any particularly interesting pattern in the data.

Roger clemens ERA

2. An alternative metric, and a fitted curve

The problem with analyzing ERA is that it is affected by a lot of things beyond pitching quality. For instance, defense affects a player’s ERA, and poor pitching is not much impacted if there happen to be no runners on base. Instead, we turn to a more reliable metric – walks plus hits per inning pitched. This metric yields less “bounce,” and a more reliable pattern is revealed. Fitting a curve, we find that Clemens’ performance deteriorated for about a decade, then started to improve for the last decade of his career.

The turning point appears to be at around the age (36-37) in which the Mitchell report suggests he used performance-enhancing drugs. When we analyze other summary measures of his pitching performance, we see a roughly similar pattern, although some look more suspicious and some less suspicious.

roger clemens

3. Creating a Comparison Group

To figure out whether Clemens’s performance is unusual, we needed to compare his career trajectory with other durable pitchers. The Clemens report analyzed Nolan Ryan, and this was a wise choice: Ryan’s performance also improved in the final decade of his career.

But a useful comparison group should involve many other pitchers who have also had long and successful careers. When we examine all 30 other pitchers who, since 1967, have started in at least 10 games in 15 seasons with 3000 innings pitched, we see a pervasive pattern: nearly all of them improve for about a decade, and then their performance deteriorates in the second half. The exceptions to this rule are those pitchers who simply tend to simply get worse through time – and this looked to be Clemens’s trajectory until his mid-30s.

But overall, Clemens’ path looks “upside-down,” as he gets worse first, and then improves later.

Roger clemens compare

4. Clemens’ Career Versus Other Pitchers

We fit a curve that describes the typical career of a durable starting pitcher. Think of this as being like the “control group” in a medical study. Clemens’s career arc looks very different than our control group, suggesting something unusual occurring.

Unfortunately, our statistical analysis cannot pinpoint the precise cause of this unusual pattern. But it is clear that the Clemens report stretches credibility in arguing that his late career was typical. His late-career performance certainly was quite exceptional given the trajectory that he was on in the first half, suggesting that close scrutiny is warranted.

Roger clemens compare


wouldn't you think that after the 10th person called WHIP into question the rest of you would stop?

the author said WHIP was a better metric than ERA... not the best.


What's the bother...he did steriods and everyone knows it. Your analysis should show age and Cy Youngs, as well as age and contract money - after all that is what he got out of it.


Thanks for the analysis. I am not going to say that I know all the details in question, specifically the periods of Clemens career that he allegedly used HGH, but I do see some interesting trends in your first and second charts (all trend lines aside).

Both the first and second chart appear to have an improvement in performance from the age of 38 to 45. During that time his ERA and WHIP almost consistently improved year-over-year. Those years would be 1998 to 2006, which coincides with much of the "Steroids Era" and his stint with the New York Yankees.

I would be interested to see statistics for your "other durable pitchers" during the same ages...


Did you consider the fact that Clemens switched from a better hitting league to a worse hitting league later in his career?


The quadratic curve fit on Clemens' data looks a little simplistic - there's lots of bouncing around. But it does appear to show a dramatic improvement from age 35 to 36, as well as steady improvement from 38 to 45.

Is there any data on his pitching speeds over time?


I am not a big Clemens fan, but I have to disagree with the use of an "Average Durable Pitcher". Ryan and Clemens are unique among old school pitchers in that they put an emphasis on strength training, particularly leg and core strength. This training appears to prolong the career of pitchers.

The question of how Clemens was able to keep his strength is debatable, but Ryan was also able to do it, and there is no indication that Ryan used enhancers to do it.


How about r^2 values for those curve fits? To my eye, it looks like the WHIP curve fits just as badly as a curve drawn through the ERA data.


yeah I'd like to see your r squared values too


Using the curve fit seems inappropriate. You're more likely to be fooled by randomness in this case than find evidence that he used enhancers.

It's fun to look into performance metrics, but aren't there easier (and much more accurate) ways of establishing whether or not he used drugs?


What about normalizing Clemens' WHIP by taking the average of all AL pitchers that year. That may remove variables like a worse crop of hitters or a larger strike zone during the last ten years.

And normalize by park too. Ten years ago is when Clemens left Fenway. Considering the fact that Fenway is a hitter-friendly park, this may actually show a more dramatic turn-around in his pitching.


The point of Clemens' argument is that while he may have performed better than average, there are some other pitchers with similar career trajectories who were not accused of cheating. I happen to think he's guilty, but this analysis is not well-done. If Nolan Ryan and other pitchers can steadily improve, this demonstrates it is possible, and that Clemens' career, while a minor outlier, is not an extreme outlier. On figure 4 above, there are two additional curves, one starting just below where Ryan's starts and the other ending just below where Clemens and Johnson are nearly colinear, show similar downward-trending shapes.


I'm inclined to believe your analysis here, but I'm uncomfortable with how it's been presented (so I eagerly await your complete report).

What is the hypothesis being tested here? With what confidence could you reject the null? Without some estimates of uncertainty, it's just looking at lowess curves (sorry, quadratics) and saying, "This one looks different." You should have enough data points here (30 pitchers x 15 years) to do some real number-crunching, instead of just exploring the use of the twoway command in Stata.

Phil Steinmeyer

Overall, this seems like overly simplistic analysis. Especially since some folks may see graphs, numeric data and/or the use of statistics, they might assume that the whole process is scientific and at least moderately reliable as an indication of whether Clemens may or may not have used PEDs.

The baseball stats gurus know that there are far better statistics than simple ERA for measuring a pitcher's performance. ERA's flaws, among other things, are that it can be significantly affected by defense, park factor, overall run production in the league and among the pitcher's opponents, and simple luck. Of course, no baseball statistics are perfect, and all are somewhat 'noisy'. Still, I don't like seeing a questionable stat used to prop up this charge.

Furthermore, a simple look at the data points in the first chart show a lot of noise. I suspect that the line plots in the later graphs could look very different with relatively small changes in the basic data.



As I look at figure 4, I see roughly 3 patterns of data
1. Those who follow the classic concave pathway relative to the x-axis (e.g. Johnson and Shilling--who improve over time, reach an optimal minima in their career, then worsen as they age). Ryan appears to be a special case of this (with perhaps one other), as he did not hang around long enough to have the late-career worsening, although if you look at the raw data for him, the worsening may be there and not reflected in the fit.
2. There is those who start relatively linearly, and then worsen with age in a concave pattern.
3. The third group (who I find most interesting) is the group who get better with age in a convex pattern. They worsen over time but the rate of worsening decreases as they age (I'm envisioning the stereotype of the "cagey lefthander"). It appears that Clemens (and perhaps one other?) is exceptional, as he reaches an non-optimal maxima, then start to improve over time. Who are these pitchers who fight age better than others? Is there a common thread that links these pitchers together?

Of course, I may just be trying to over interpret a bunch of really sucky fits that are close to mean ingless.



It is a good excercise to compare average WHIP across the four teams Clemens pitched. As the story goes, obviously he took steroids in Toronto and Houston and not in New York or Boston.


This analysis is completely invalid. Did it ever occur to you that injuries play a significant factor in performance? Clemens was often injured in his last few years with Boston. In fact, Clemens has always performed worse when pitching through injuries. It doesn't take a rocket scientist to figure this out. What about pitching in the NL vs. AL? It's pretty clear that getting to pitch to a pitcher instead of a DH once ever 9 at bats (except when pinch hitters are used) will make your numbers look better. Perhaps knowing something about baseball would help when trying to do a proper analysis.


Adding to the comments above, what would happen to your WHIP regression curve if the 2004-06 seasons spent in the NL with the Astros were taken out? The qualitative difference in AL versus NL batters could very well account for a large portion of the late improvement. Perhaps league-adjusted ERA or WHIP would yield better true indicators of improved performance.

Drawing conclusions based on the quadratic regression without acknowledging this phenomenon could in turn "create" this apparent improvement at age 36-38.


It is pretty funny to see complaints about sabermetrics from a crowd that knows more about statistics because it is impossible to get great statistics in baseball. From least to most useful pitching statistics probably go saves, W/L record, total strikeouts, ERA, k/9, ERA+, WHIP. Yes WHIP measures his increasing control and decreased bb/9 as he gained control in his 30s, and relied on his defense, but it is realistically the best statistic for pitchers that is easy to explain and quantify. It would probably been better to use a WHIP+ statistic that normalized for league, ballpark and defense, but that would actually have told basically the same story (although the numbers might look a little nicer). Of course you really shouldn't use one statistic when evaluating a player, but that is what Clemens' agents did.

Chris McGraw

I'm not a statistician, but I think I can raise a few reasonable objections to the analysis undertaken here. Firstly, accepting that the WHIP statistic is a reasonable estimator of pitching skill, does it not strike the authors as odd that Clemens' career from the time before he is even accused of having used performance-enhancing drugs already exhibited an unorthodox bend (figure 3)? Specifically, it starts lower than most (which is to say that he was already better than most other pitchers in this class by the WHIP statistic), and it shows a lower rate of change during this time (which suggests his skill was not degrading as fast as others). If we confine our analysis to just this section of the curve, I think you can make a similar argument that something was "anomalous" in his early career, and if he is not believed to have been using drugs during this time (why would he have?), then perhaps he himself is an anomaly and I'm not sure how anyone could tease apart the contribution of one anomaly (his physiology) from another unknown but postulated anomaly (his alleged use of drugs) in his later career.

Secondly, if the strength of the authors' rebuttal relies upon the fact that it is statistically uncommon for pitchers to improve or at least stay sharp in their final years, how come there is no discussion of Nolan Ryan's own "anomalous" (though steady) improvement throughout his career (figure 3)? Ryan actually reaches a level of skill (by WHIP statistic) comparable to Clemens only in his last years, and to conclude that Clemens' later record is suspicious essentially because his early career was so strong (unlike Ryan) seems premature without a discussion of the certainty of the trend analysis.

Therefore, thirdly, I think a discussion of linear regression, its methods and its shortcomings would be in order here. It's unclear how sure we even really are (in the statistical sense) that Clemens' curve does not continue to go up vs plateau vs go down, nor are we sure that it should really start as low as it does or bend as high as it does. Does the analysis still hold if we remove a single data-point (eg. his exceptionally low WHIP score at age 25 or high WHIP score at 38) or cluster of data points? I don't know the answer to this question, but it seems very deserving of some consideration.

In the end, however, the point is taken that one could theoretically interpret the statistics to suggest the opposite of what the exonerating report has claimed. I'm just not sure if it has actually been done here.



See Phil Birnbaum's to read why this is no good.