Does Early Education Reduce the Achievement Gap?

John List and Uri Gneezy have appeared on our blog many times. This guest post is part a series adapted from their new book The Why Axis: Hidden Motives and the Undiscovered Economics of Everyday Life.

The past 60 years in the U.S. has seen dramatic policy changes to the public-education system. The ‘50s, ‘60s, and ‘70s saw desegregation and affirmative action, and since the ‘80s there have been efforts to increase school funding, the introduction of voucher systems, and the creation of countless charter schools. In between we’ve seen efforts to reduce class sizes, introduce technology into classrooms, improve teacher credentialing, and a massive attempt to leave No Child Left Behind. 

achievement gap

What do we have to show for all this? That’s hard to say. Even though many programs have a high price tag, they were never implemented with an eye towards assessment. The data we do have shows that not much has changed over the past 30 years. The figure attached shows how the racial achievement gap in test scores has persisted for white and black Americans since the late 1970s. 

If you don’t like the breakdown by race, then consider that the high school dropout rate among high-income families in 1972 was 2% and in 2008 it was still at 2%. For low-income families, though? In 1972 it was 14% and in 2008 it was still at 9%. This sort of trend (or lack thereof) is manifested in dozens of measures of academic achievement, all of which suggest that the past 60 years of educational reform has very little to show for itself.

One of the reasons for our initial interest in educational reform was that we’re really good at testing ideas, but when we first started working with poorer urban schools we found that change was hard to effect. Rewarding parents, teachers, and students for higher grades, better study habits, and improvement on test scores all led to improvements that varied between modest and promising, but none were a silver bullet.

Around the same time a consensus was building in policy making and academic circles that early childhood interventions—like high quality pre-schools—were the most promising prospect for the next round of educational reforms. But testing the impact of an educational reform like that would be unthinkable. It would require millions of dollars, a huge team of teachers and researchers, a school district with an eye towards assessment willing to work with academics, and an army of researchers. 

So that’s exactly what we did.

Just thirty miles south of Chicago we started working with the Chicago Heights school district (which is demographically and economically identical to Chicago) and we found donors willing to invest in our idea in Ken and Anne Griffin. Then we went ahead and built two pre-schools and a parent academy and randomized families into a control group, a pre-school program, or a program that would work with parents to improve their efforts in raising their children.  

How’s it going? Well, the experiment is still in the early stages, but the results so far have been very promising. Students in the two preschools are now doing better than the average child across the nation. These results are astonishing, given the fact that Chicago Heights preschoolers lagged severely behind the average before we started the program.

The children whose parents are in the Parent Academy haven’t seen such stark growth in their test scores, but perhaps most promising is that the growth that did occur persists. Children in the pre-school program seem to have a nasty habit of forgetting what they learn over breaks like summer vacation. 

But this test is still in its early stages. We have an eye towards the long-game here. We’ll be following the families of participants for the rest of their lives, with the goal of informing a whole new generation of educational reform.

Duane Swacker

Any analysis that relies on standardized tests scores as "the metric" is invalid as all the errors involved in the making of educational "standards" and standardized testing render any results "vain and illusory" as Wilson states and has shown in “Educational Standards and the Problem of Error” found at:

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)

1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren't]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?

My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.


Taylor S. Marks

I think we all know this system is broken. The question isn't whether it works or not, but what better system we could use, if one could even exist. The current one offers a means for colleges to accept/reject students and do so in a system that seems, at least for most schools, fair (some of the schools which simply receive too many applications get overwhelmed no matter what. Some are honest about it. Others are dicks - IE, "We're not wasting paper to tell you this" Stanford.)

Enter your name...

I don't even agree that the current system is "broken". It achieves certain goals quite effectively. Standardized tests are probably the best, most efficient way of figuring out whether someone can do simple math, for example, and they're not actually that bad these days at handling simpler writing exercises.

The complaints against testing usually have a lot more to do with wanting non-content skills to be favored, like "creativity" or completely non-academic skills, like good emotional skills.


A well-designed experiment certainly cannot hurt. However, there is already a mountain of evidence in favor of early childhood education and its long term effects. The most famous that I can remember is the NC Abecedarian project ( .

The problem in education is not the presence or absence of evidence but the lack of political will to follow the evidence. We could have invested a few billion dollars (a minute fraction of our current debt) in universal preschool 20 years ago and been much better off than we are today.


I would have to respectfully disagree. If there really was a "mountain of evidence" then it would be a done deal. Statements like yours oversimplify the problem and suggest more government funding as the only solution without looking at the big picture to explore other alternatives. That's the whole point of looking at the hidden side of everything. I think the point of the blog post is to better understand what works and how long-lasting are the improvements.

Consider for example how research has shown that Headstart improves outcomes initially but the benefits are lost within the first two years.

Phil Persinger


A "mountain of evidence" exists for climate change and evolution, but that hasn't resulted in a "done deal" for either when it comes to public policy.

You're correct that some research indicates students' back-sliding subsequent to Headstart. Jason clearly agrees with you about this; his complaint concerns the lack of political will to institute comprehensive day-care and pre-K programs (as described in the Freakonomics post and in his comment) and to support the gains made there in an evidence-based manner all the way through high school graduation. This is no more an over-simplication of the situation than your own comment.

An enterprise of this scale and persistence (interstate highway system, Apollo program, armed forces) seems achievable only through government funding or subsidy-- unless you can describe a purely private-sector alternative.


How is a 30% drop in dropout rates considered not a sufficient result? That seems pretty significant from here.


Great point Thalia, I made the same comment before seeing you had made this point.

steve cebalt

Hi Thalia: I agree -- a drop from 14% to 9% in dropouts strikes me as very significant, but how do I know? Likewise the chart above also shows what looks like a very significant narrowing of the gap in scores for blacks vs. whites, contrary to the authors' statement that not much has changed. I've noted similar inconsistencies with these authors in other blog articles. I might be interpreting this wrongly, but the authors fail to explain these basic and important things, so I am left to my own feeble efforts.

Phil Persinger


Thanks for your Mother's Day essay, by the way....

The problem with the narrowing of math scores and drop-out rates is perhaps the 30- and 40-year interval involved in the respective instances: that these gaps, however diminished, remain after all the time and treasure invested. It's a big public policy question which can easily be expanded from education into areas like income inequality and racial discrimination. Many (like Taylor Marks above) seem to have abandoned support of public education out of frustration-- and who can blame them?

steve cebalt

Hi Phil, and thank you! Good point about the long time interval. I live in Indiana, where vouchers are the current experiment in public/private schooling, using tax dollars to allow people earning $128,000 or less to place their kids in private schools. It's already shaking up the system in a dramatic way. One problem: No one established goals or measurements. So 30-40 years hence, people will be struggling to say whether it worked -- only after several generations of kids have been through the experimental new system.


Great project. One quibble - you write that a drop from 14% dropout rate to 9% is "little to show." But isn't that hundreds of thousands, if not millions, of children every year? I'm no math wizard, but that sounds like a third fewer dropouts. Are there that many initiatives that can claim such a success rate?


Why aren't Asians represented in racial data? From my experience, when they exclude Asians when discussing any racial disparity, it's typically propaganda and not factual.

High schools can (and did) affect changes even with poor (or non-existent) preschool. Garfield high school was a good example. Read "Standing and Deliverying" by Henry Gradillas on how this was done.


How interesting that cultural attitudes of ethnic groups towards education, rural vs. urban locations, one parent households vs. two parent households with Stay at Home moms, have no representation in the article. You know, just because some things are difficult to quantify, or deemed culturally inappropriate by the liberal tenured elites, does not mean they have no impact. You guys seem to be fishing hard for answers, but maybe you are just asking the wrong questions. Are you really trying to say that the elements in your equation for academic success($/student, head start, etc.) bear more importance to a father in the household, and a mother who ensures her kids go to school and prepare for their future? You are missing the forest for the trees.