Analyzing Roger Clemens: A Step-by-Step Guide

Yesterday, I posted about the conclusions that Eric Bradlow, Shane Jensen, Adi Wyner, and I drew from analyzing Roger Clemens‘s career statistics. I thought that it might be useful to show how we got from the findings in the Clemens Report (exonerating him), to our somewhat opposite conclusions. So for budding forensic economists, here is a step-by-step guide, with pictures.

1. The Raw Data

The “Clemens Report” mainly analyzes his earned run average through time. These numbers appear to show no reliable pattern, as they bounce around a lot from season-to-season. At this point, it is hard to see any particularly interesting pattern in the data.

Roger clemens ERA

2. An alternative metric, and a fitted curve

The problem with analyzing ERA is that it is affected by a lot of things beyond pitching quality. For instance, defense affects a player’s ERA, and poor pitching is not much impacted if there happen to be no runners on base. Instead, we turn to a more reliable metric – walks plus hits per inning pitched. This metric yields less “bounce,” and a more reliable pattern is revealed. Fitting a curve, we find that Clemens’ performance deteriorated for about a decade, then started to improve for the last decade of his career.

The turning point appears to be at around the age (36-37) in which the Mitchell report suggests he used performance-enhancing drugs. When we analyze other summary measures of his pitching performance, we see a roughly similar pattern, although some look more suspicious and some less suspicious.

roger clemens

3. Creating a Comparison Group

To figure out whether Clemens’s performance is unusual, we needed to compare his career trajectory with other durable pitchers. The Clemens report analyzed Nolan Ryan, and this was a wise choice: Ryan’s performance also improved in the final decade of his career.

But a useful comparison group should involve many other pitchers who have also had long and successful careers. When we examine all 30 other pitchers who, since 1967, have started in at least 10 games in 15 seasons with 3000 innings pitched, we see a pervasive pattern: nearly all of them improve for about a decade, and then their performance deteriorates in the second half. The exceptions to this rule are those pitchers who simply tend to simply get worse through time – and this looked to be Clemens’s trajectory until his mid-30s.

But overall, Clemens’ path looks “upside-down,” as he gets worse first, and then improves later.

Roger clemens compare

4. Clemens’ Career Versus Other Pitchers

We fit a curve that describes the typical career of a durable starting pitcher. Think of this as being like the “control group” in a medical study. Clemens’s career arc looks very different than our control group, suggesting something unusual occurring.

Unfortunately, our statistical analysis cannot pinpoint the precise cause of this unusual pattern. But it is clear that the Clemens report stretches credibility in arguing that his late career was typical. His late-career performance certainly was quite exceptional given the trajectory that he was on in the first half, suggesting that close scrutiny is warranted.

Roger clemens compare


Wow, great even lets the nerds enjoy the Clemens controversy


I'd be curious to see this analysis done with a metric that more accurately reflects a pitcher's actual performance over the course of season. xFIP would be a great choice, as it corrects for things such as luck, defense, and ball park factors.


I would also say good job on hiligting some of the honest pitchers and what they had to compete with. I guess Dan Duquette (Boston GM that released/did not sign Clemens) was not such a dummy after all...only I guess he should have take 'roids in to the picture as they got into the pitcher.


Clemens worked harder than any pitcher that lived before him. Also, he benefited from legal advances in technology (B12, greenies, etc.).

JR Flanders

My only issue is that your Clemens curve appears to have a very poor fit. The distances from the line to the data, or the residuals, are large. i'd like to see how good that fit really is. Mind you, I still think he cheated, but I doubt it had that great of an effect.

Jason Smith

While this a great article in the abstract, I don't believe these numbers, either in the form of the ERA or the WHIP, are conclusive. This is as a physicist looking at data, not a Clemens fan or detractor.

The data do not appear to be inconsistent with a zero or even negative curvature fit. This is only based on my experience; I haven't done the calculation. However, if you provide the data, I will certainly give it a try.

I'd like to see the error bars on the data points (Clemens didn't pitch exactly the same number of pitches every year, so each point represents only sample of his ability over time) as well as the uncertainty of the quadratic fit before saying Clemens career looks typical or not.


A previous commenter mentioned it - but why hasn't it been taken into consideration that Clemens moved to the NL. A pitcher faces considerably less talent in the NL overall - not to mention facing a pitcher (usually a very poor hitter) 1 out of every 9 at bats.

He also wasn't putting in a full work load. His partial seasons allowed him to stay stronger for longer.

I think something that would be more representative of whether he declines is the speed of his fast ball. If he was truly using Performance enhancing drugs - wouldn't his fast ball have picked up MPH? Especially when you consider how much he trains - performance enhancing drugs would, i would opine, allow him to not only negate the effects of aging but overcome them and his fast ball should have picked up some mph that he would have lost otherwise.

I don't know the stats - but being a Yankee fan - Roger's fastball had definitely been waning even when he first came here. He just seemed to pitch smarter. But I'll still await the evidence



Off the top of my head, I can see a bunch of things wrong here:

WHIP, like ERA, is subject to many outside factors, defense, park, league, etc. Using WHIP doesn't make them go away.

Those curves seem to have only a very loose correlation to the actual WHIP, what is the confidence level that there is an _actual_ inflection point.

You have data points at 45 and 46 for Clemens. By standard reckoning (ie, age at the beginning of the season), Clemens did not have an age 45 season. Also, he has not yet turned 46.

I don't think the Hendricks brothers ever claimed that Clemens was 'typical' in any sense. They simply claimed that his longevity was not wholly unprecedented. Put another way, if you were to compare Ruth's home run hitting patterns to every player that played before him, you could conclude that 'something unusual [was] occurring'


"The problem with analyzing ERA is that it is affected by a lot of things beyond pitching quality. For instance, defense affects a player's ERA"

WHIP has the same problem as ERA; Barring homeruns, all hits involve defense.

Eric Rachlin

You say WHIP is a more reliable metric, but your data looks almost as scattered as ERA. I'm not Clemens fan, but I'm surprised you felt confident enough in your conclusions to publish an article in New York Times. Is there an actual paper in which you explain your methods in detail? How did you fit the data? What metrics did you look at?

I'm guessing you guys fit the data using some kind of least squares method. If this is the case, it appears that the shape of your curve would be completely different if you one or two outliers were removed.

Your challenge to the Clemens Report's use of data is reasonable, but your conclusion seems tenuous as well. If we start applying your methods to many different pitchers, using a variety of metrics, how many will raise suspicion.

You essentially conclude the Clemens' career seems unusual, but of course it is, he's one of the greatest pitchers of all time! Ryan's stats may not exonerate him, but they do suggest that great pitchers do at times produce abnormal statistics.



#37 - No, you read that incorrectly. I was talking about his last years in Boston.


Your analysis would draw light to the fact the Nolan Ryan has a similar pitching curve late in his career, that is he doesn't have the characteristic rise that almost all the pitchers have. If you are willing to condemn Clemons, it would seem that you also insinuate that Ryan was "doing something fishy". Or you have to abandon the argument all together and accept that outliers exist, which is what I believe the Clemons Report is trying to address.


I'm curious as to using WHIP. Why not use something more akin to defense independent pitching statistics or DIPS. Focus on BB, K, HR rates, things that don't really get affected much by the quality of the short stop or center fielder. Hits do. And since the error bars on the curve for Clemens are LARGE, not small, I think we can safely say, this data, as presented, is inconclusive. Ideas for a fix:
1- DIPS for Clemens
2- DIPS for peer group (power pitchers who went on into 40's... the career arc of Greg Maddux has about as much to do with the expectations of a Clemens natural career arc as the price of copper has to do with the taste of cheese.



I am not a big Clemens fan, but I have to disagree with the use of an "Average Durable Pitcher". Ryan and Clemens are unique among old school pitchers in that they put an emphasis on strength training, particularly leg and core strength. This training appears to prolong the career of pitchers.


And look at the Ryan trend - the most durable real pitcher and he went down in a straight line.

At another comment - pitching in a worse league most of his starts toward the end were Blue Jays (2 cy youngs), NYY (1 or 2 cy youngs?), and Huston for only a few half seasons (some drop off in hitting ability for the #9 hole, but other things to worry about like more speed/runners on bases, etc).

I enjoyed watching him pitch, but with steriods in the game I would rather watch robots play. Yes, either humans or robots nothing inbetween.

Mr Hughes

I agree that WHIP is a better indicator than ERA

However, Clemens pitched in the NL later in his career - more pitcher friendly. The other guys he was compared to did not pitch in the late 90s/2000s when training/legal supplements etc made huge advances.

Clemens and Ryan are not the norm - they are physical freaks who were able to do what no one else did.


Regardless of the granularity of the data, even a macro-look at the curve indicates that the Clemens-produced report cannot POSSIBLY exonerate the pitcher on the basis of observed performance alone. While the Wharton work certainly doesn't purport to prove guilt or innocence, it certainly does introduce the very relevant and objective counterpoint to the earlier statistical 'whitewash'.

Chris Farley

I don't like the WHIP graphs because they do not have 0 at the origin.


Just ran this through Excel (by guessing at the approximate coordinates for each point given in figure 2). The r^2 fit for a quadratic fit is 0.04, and the linear fit (which is upward...did steroids slow Clemens' rate of deterioration relative to a steeper upward fit for other pitchers?) is 0.02. I am not compelled.

FYI, the quadratic equation is -0.0004x^2+0.0322x+0.6101. The linear is 0.0026x+1.1003.


To my eye (as a physicist in training), the fit used to the WHIP data is highly unreliable. (See post #53! The quadratic fit is barely better than a linear fit, for instance.)

I cannot take seriously any such analysis if the analyst does not at least discuss some measures of fit quality. The fact that this is not mentioned in the article erodes my confidence in the author. And then moving on to show graphs with huge numbers of these magic fits, attempting to draw conclusions from them? Are those other fits any better?

Without proper statistical analysis (which may have been done, but if so, mention it!) this is all numerology.


Sorry, the quadratic doesn't cut it. There are at least three phases over a long career, which shoots down the theory that statistics support a simple drug use pattern.