Search the Site

The Debate over Teacher Merit Pay: A Freakonomics Quorum

The term “merit pay” has gained a prominent place in the debate over education reform. First it was D.C. schools chancellor Michelle Rhee trumpeting it as a key to fixing the D.C.’s ailing public schools. Then a handful of other cities gave it a go, including Denver, New York City, and Nashville. Merit pay is a big plank of Education Secretary Arne Duncan‘s reform platform. Chicago mayor Rahm Emanuel has just launched his own version of merit pay that focuses incentives toward principals.
There’s just one problem: educators almost universally hate merit pay, and have been adamantly opposed to it from day one. Simply, teachers say merit pay won’t work.


In the last year, there’s been some pretty damning evidence proving them right; research showing that merit pay, in a variety of shapes and sizes, fails to raise student performance. In the worst of cases, such as the scandal in Atlanta, it’s contributed to flat-out cheating on the part of teachers and administrators. So, are we surprised that educators don’t respond to monetary incentives? Is that even the right conclusion to draw?
For answers to these and related questions, we decided to convene a Freakonomics Quorum. We reached out to a handful of education researchers and experts, and asked them the following:

Why don’t incentives appear to be working in cases of teacher merit pay?

The result is a thorough discussion that mimics the national debate. While some participants argue that incentive-based systems are fundamentally flawed, no matter what institution they’re applied to, others believe that merit pay does work, and simply needs to be tweaked in order to realize its full benefits. Either way, there is a lot of information presented, some of it contradictory, all of it interesting. Thanks to everyone in the Quorum for participating, and as always, let us know what you think in the comments section.
Julie Marsh is an adjunct researcher at the RAND Corporation, a non-profit research organization, and visiting associate professor at the Rossier School of Education at the University of Southern California.

First, let’s be clear that not all pay-for-performance (P4P) programs are the same. These programs differ greatly, from the choice of collective versus individual incentives, to the criteria by which incentives are awarded, to the inclusion of additional capacity-building elements, to the amount of the reward. Also, very few of these programs in the United States have been tested empirically. The research we’ve done at RAND and elsewhere in recent years has focused on programs incentivizing educator performance based primarily on the results from annual state tests of student performance. While limited, this research, along with theory, nevertheless suggests that several core factors may have contributed to the poor results found in recent P4P programs.

One factor is program design. Many of the programs studied, including New York City’s Schoolwide Performance Bonus Program, have expected financial incentives alone to inspire improvement and have not included additional supports and resources potentially needed to bring about improvement. As others have argued in the past, motivation alone does not improve schools. Even if incentives inspire staff to improve practices or work together (in the case of collective incentives), educators may not have the capacity or resources (e.g., school leadership, social capital, knowledge, instructional materials, time) to bring about improvement.
The decision to link incentives to student test results exclusively or almost exclusively may be another design element contributing to the lack of observed results. Research and theory suggest that to achieve desired results, individuals and groups targeted by incentives must buy in to the program and its criteria. If, as we found in NYC, participants do not support the performance criteria (e.g., more than three-fourths of teachers surveyed in our NYC study felt bonus criteria relied too heavily on student test scores), the motivational power of the incentive could be greatly compromised.

A second factor is program implementation. For example, according to research, individuals and groups targeted by incentives must have a high degree of understanding of the program. Yet there is evidence that often too few do. In NYC, more than one-third of teachers did not understand the targets their school needed to reach to be eligible for the bonus, the potential bonus amount, or how decisions would be made regarding distribution within the school. Poor communication can severely limit the motivational effects of incentives. If individuals don’t understand the criteria, how will they know where to direct their efforts? If they don’t know the amount at stake, how can they gauge whether the payoff is worth the effort?

A third significant variable is the context within which P4P programs operate. Under current policies, all schools and educators face significant pressure to perform well on the same measures that are often incentivized by P4P programs. While educators are making changes in response to these broader accountability pressures, how much additional change can we realistically expect from added financial incentives? In NYC, we found that teachers in schools not assigned to the bonus program (control) were just as likely as those from assigned schools (treatment) to report undertaking a host of efforts to help their school achieve a high Progress Report grade, including efforts to improve student attendance, seek professional development opportunities to improve their practice, and work with students to set and monitor goals. In fact, teachers often reported that accountability pressures—to achieve their school’s Adequate Yearly Progress target and to receive a high Progress Report grade—were more salient than financial bonuses.
Finally, individual perceptions may also affect the outcomes of P4P programs. Principals and teachers in NYC, for example, consistently reported viewing the bonus as recognition for work they were already doing (e.g., “a pat on the back”) rather than a goal for which to strive. Also, intrinsic motivators—such as seeing themselves improve and seeing their students learn new skills and knowledge—ranked much higher than financial bonuses on the list of potential motivators cited by teachers on surveys. In this context, how much added motivational value should we expect from financial bonuses?

Chicago is now putting in place its own merit pay program, and it will be fascinating to see the results. Media coverage suggests the program design may anticipate some of the concerns mentioned above, for example, by using multiple performance measures beyond just test results and including training for principals. Yet the details of both program elements are still unknown. How much are test scores weighted? How are “quality management” and “school climate” measured? What kind of training will be provided? Program implementation, of course, cannot be judged right now. Lessons from past research suggest that communication will be important. And finally, there’s context. How much added motivational value will be gained from the financial incentives compared to other accountability pressures and intrinsic motivators? That remains to be seen.

Richard Rothstein is a research associate of the Economic Policy Institute, and a senior fellow of the Warren Institute on Law and Social Policy at UC (Berkeley) Law School. From 1999 to 2002 he was the national education columnist of The New York Times. He is the author of Grading Education: Getting Accountability Right, and lectures widely about education policy issues.

The fatal flaw in education merit pay plans is failure to consider carefully what the plans attempt to accomplish.
Typically, as in Emanuel’s plan, the central goal is to get more students to achieve pre-defined proficiency cut-off scores on standardized tests of basic skills in math and reading. From what little we know of Emanuel’s plan, it seems that principals will also be evaluated by other “objective” measures but these, too, mostly rely on student math and reading scores – how students gain compared to students elsewhere, and how many teachers are “effective,” based also on math and reading scores. Principals will also be rated on student and teacher attendance, and on “school climate,” whose definition has, so far, been unstated.
But schools have many educational goals – not only easily tested basic skills in math and reading, but the sciences, history, good citizenship, appreciation of literature, the arts and music, physical fitness, good health habits, and character. With Emanuel’s accountability system, any rational principal will ensure that teachers devote excessive attention to drill and preparation for math and reading tests, while giving short shrift to other curricular elements they have been charged to deliver.
In any institution with complex or multiple goals, incentive systems that reward achieving only some of those goals (usually those most easily measured) will inevitably distort that system’s output. Rational agents, responding to incentives, will ensure that resources, time, and attention are redirected to goals being rewarded, and away from those (perhaps equally important but more difficult to measure) not being rewarded.
Thirty years ago, the methodologist Donald T. Campbell framed what he called a ‘law’ of performance measurement:
“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Since then, social scientists have documented how simple accountability or incentive systems based on quantitative output indicators have actually harmed the institutions they were designed to improve – not only in education but in business, health care, welfare policy, human capital development, criminal justice, and public administration.
When health care systems (such as Medicare) attempted to reward cardiac surgeons, or their hospitals or practice groups, for survival rates of their patients, medical professionals responded by declining to operate on the sickest patients. When the Department of Labor attempted to reward local agencies for placing the unemployed in jobs, the agencies increased placement rates by getting more workers into more easily-found short-term poorly-paid jobs, and fewer into harder-to-find but more skilled long-term jobs. When prosecutors have been rewarded for the number of cases cleared, more plea bargains based on false confessions resulted. When U.S. News and World Report ranks colleges partly by the share of applicants for whom they have no space, colleges respond by soliciting unqualified high school students to apply.
So it is no surprise that K-12 educators respond similarly. The current federal education law, No Child Left Behind, sanctions schools whose math and reading scores fail to improve at “adequate” rates. The result: a narrowing of curriculum, with the greatest losses of science, social studies, the arts, and physical education instruction in schools with more low-income students, because these are schools under the greatest pressure to raise math and reading scores.
Across the nation, NCLB has created incentives for principals to order teachers to focus attention on students whose prior performance indicates a likelihood of falling just short of the passing point. These are students for whom slight improvement will have disproportionate impact on a school’s (and thus principal’s) performance rating. There is no incentive to focus instruction on high achievers, who will pass in any event, nor on the lowest achievers, who may make great gains but who won’t “count” unless the gains are so great as to pass.
Such gaming is legal. Barely distinguishable is illegal cheating, documented now in Washington, D.C., Atlanta, and elsewhere. This, too, is an almost inevitable consequence of accountability and incentives to raise test scores at all cost.
Even within math and reading instruction, incentives for test score improvement corrupt the curriculum. Standardized tests in Chicago and elsewhere cannot sample the full curriculum even in these core subjects. For example, the language arts curriculum calls for making oral presentations as well as decoding written text. But standardized tests include no oral presentations; these are then dropped from the curriculum.
There is now substantial evidence that pay for performance does not even work on its own terms – reading and math scores don’t increase when teachers or entire schools are offered bonuses for higher scores. But even if pay for performance did work on its own terms, it would harm public education.

David Figlio is a professor of education, social policy and economics at Northwestern University, a research associate at the National Bureau of Economic Research, and Associate of the Institute for Research on Poverty at the University of Wisconsin-Madison, and a fellow at the Institute for Policy Research at Northwestern University.

One thing we’ve learned from our experience with No Child Left Behind — and the state accountability systems that preceded it — is that educators respond to incentives. Most people are familiar with stories of outright cheating by teachers in response to high-stakes tests. And there have been many other responses that are less blatant but every bit as manipulative. Educators have tried to gain an edge on the tests by focusing on so-called “bubble kids,” selectively disciplining slow learning students so that they’d be absent on test day and carbo-loading kids to give them a short-term brain boost.
School accountability systems have led to substantive changes as well.  There’s increasing evidence that at least some of the test score gains we’ve witnessed with school accountability have been genuine.  And my surveys of teachers and principals in Florida tell us that school accountability is leading to changes in practice, and not just changes in answers.
Given the mounting evidence that one form of educator incentives seems to improve student learning (at least along measurable dimensions), people should wonder why some prominent recent merit pay experiments haven’t led to improvements.  I can think of a few possible reasons.
One explanation is psychological.  There’s evidence that people who choose to become teachers tend to be more cooperative and tend to wish to avoid competition.  When policy-makers (or experimenters) impose merit pay systems on people who don’t like to compete with one another, they may find that teachers aren’t willing — or wired — to compete in meaningful ways.  School accountability systems might spur teachers on in part because they energize other stakeholders — parents and community members whose housing values are directly tied to school ratings and who put more constant pressure on teachers and principals.
Another possibility is that teachers may respond to policies they perceive as permanent, but not those they view as temporary or those where they believe the target is moving. Educators might believe that it is not worth the extra effort to change their behaviors in ways that might be rewarded one year but not the next. School accountability systems, though they change too, have more of the patina of permanence.
A third possibility is that the meaningful gains that resulted from school accountability have been due primarily to the actions of school leaders rather than individual teachers. School principals are largely responsible for the changes in instructional policies and practices associated with accountability policies, and school principals have control over how students are matched to teachers. If principal actions, and not teacher actions, are the cause of the improvements with accountability, then there is less of a disconnect between the school accountability research and the merit pay research. And it also suggests that the teacher merit pay research might be less helpful in predicting what might happen with Rahm Emanuel’s principal merit pay plan.
Suppose it’s really true that teacher merit pay does not spur teachers to meaningfully raise their game. This doesn’t necessarily mean that teacher merit pay is ineffective. Merit pay can affect student outcomes in two different ways — through changes in teacher effort and through changes in the set of people who decide to enter teaching. The recent merit pay experiments can only speak to the first question, but they can’t provide insight into the second.  Instead, we’ll have to wait to see what happens to teacher recruitment in places like Florida, which earlier this year reformed its teacher tenure system such that student test scores have the highest weight in determining whether a teacher is renewed. When merit pay is codified into law in such a powerful way, maybe different types of people — specifically, those who think they’ll be good at raising test scores — will choose to become teachers. Whether this would be a good or a bad development is in the eye of the beholder.

James Guthrie is senior fellow director of Education Policy Studies at the George W. Bush Institute in Dallas and a professor at the SMU Annette Simmons School of Education. In addition to his academic career, he has been a high school principal and elected school board member.

Public schools are currently trapped by dysfunctional and perverse incentive systems. Most debates regarding merit or performance pay for teachers and principals miss this essential point. Taking protective cover under the results of a single failed experiments or ill design current performance pay programs only avoids the issue. The status quo is failing.
The challenge is overcoming existing incentive systems that promote and protect employees over students and their learning, existing incentives that act like magnets to suck effective teachers from their classrooms and place them in positions of greater privilege but lower priority.
Current systems, under which more than 90% of public school teachers and principals operate, reward conditions that were sensible a century ago, but today bear little or no relationship to effective classroom instruction or school leadership.
However, the current system is more perverse and insidious than immediately meets the eye. It does waste public money by paying professional educators for years of service or college credits. However, there is a deeper deceit.
Today’s hidden incentive system offers teachers rewards for putting distance between themselves and students.  Teaching is treated as the lowest rung on the totem pole of power. The path out of the classroom involves becoming a counselor, a reading specialist, a community liaison, an assistant principal, a principal, a central office administrator, a superintendent, and, perhaps onto becoming a state or federal government administrator or a professor.
The further one is removed from the daily rigor of actual teaching, escaping the classroom and being less concerned with the main business of the institution, facilitating students’ learning, the more one is rewarded with control over one’s time, more interaction with adults, higher pay, and greater prestige.
Replacement incentive systems must be designed to take into account the complexity of teaching and leading a school. That is, they cannot concentrate on one goal alone. However, they must include at their core measures of student academic achievement.
In addition, new incentive systems must create a hierarchy of rewards that enables an effective teacher to expand her influence and still remain a classroom teacher.  As matters now stand, principals have far too many teachers to supervise. The number of their direct reports would seldom be matched in the private sector or in the military. Creating a professional educator hierarchy, as part of a teacher performance pay system, would address this problem also.