Does This Analysis of Test Scores Make Any Sense? A Guest Post

Here’s the latest guest post from Yale economist and law professor Ian Ayres. Here are Ayres’s past posts and here is a recent discussion of standardized tests.

A recent article in the Times trumpeted the results of a report that had just been released by the Educational Testing Service (E.T.S.).

The E.T.S. researchers used four variables that are beyond the control of schools: the percentage of children living with one parent; the percentage of eighth graders absent from school at least three times a month; the percentage of children age 5 or younger whose parents read to them daily; and the percentage of eighth graders who watch five or more hours of TV a day. Using just those four variables, the researchers were able to predict each state’s results on the federal eighth-grade reading test with impressive accuracy.

“Together, these four factors account for about two-thirds of the large differences among states,” the report said. In other words, the states that had the lowest test scores tended to be those that had the highest percentages of children from single-parent families, eighth graders watching lots of TV and eighth graders absent a lot, and the lowest percentages of young children being read to regularly, regardless of what was going on in their schools.

The article fairly portrays the text of the study, which concludes:

In statistical terms, these four factors account for two-thirds of the differences in the actual scores (r squared = .66). That is a very strong association. (emphasis added).

The last sentence is odd. Normally, I’d look at the statistical significance of the individual factors if I were going to judge the strength of the association. The report’s phrasing suggests a strong association between the reading score outcome and all four of the underlying factors. But what you would not learn unless you dug into the appendix is that only 3 of the 4 factors were statistically significant.

It turns out that the impact of the “percentage of children under age 18 in a state who live with one parent” (labeled in the table as “onepar”) is neither large nor statistically different from zero. A one standard deviation increase in the percentage of single-parent kids only reduces the predicted reading score by only about half a point (while a one-standard deviation increase in heavy TV watchers reduces the predicted reading score by 3.3 points).

Moreover, this marginal effect by traditional standards is not statistically significant. The estimated negative impact of single-parent families may simply be a byproduct of chance (the T value indicates that the estimated negative coefficient of -0.0656 is only about four-tenths of a standard deviation away from zero — so we can’t reject in this data the possibility that the true impact of one-parent families on reading test scores is positive).

When I reran the same regression but dropped the “onepar” variable, the adjusted r-squared increased slightly. (You can download an Excel file with the full results and data here). That’s right: a three-factor regression does an even better job at explaining the reading score data.

We shouldn’t put very much weight on this regression. Instead of analyzing data on individual students, the report focused on aggregate state data that suppresses by averaging a great deal of the real variation of interest. The 4-factor regression only concerns 50 state data points. There may be other evidence in other studies that children of one parent families have poorer educational outcomes, but there is not a strong association between the two variables in this particular regression data.

mhw

of course several of these values are self reported, e.g., whether parents read to their pre school kid

in fact, the use of this metric is probably a proxy for "white" since white parents may feel they are "supposed" to read to pre school kids but of course if you did a study where "white" was an independent variable you'd be tarred as rascist

Cael

I'm not sure it's entirely reasonable to assert that those independent variables are exogenous (i.e., out of the control of the school system). If a school systematically assigns more homework, students will, on the margin, watch less TV. If a school's environment is sufficiently torturous, it will raise absenteeism.

Prof. Ayres's point about the ecological fallacy is very important, and it should be noted that the principal consequence of the ecological fallacy is to produce misleadingly high r-squared values.

We shouldn't necessarily conclude from this regression that the single-parent environment has no effect on outcomes. Rather, we could conclude that the effect of the single-parent household is captured by the tendency of children in single-parent homes to watch more TV, attend school more sporadically, and be read to less often. Regression only captures _ceteris_paribus_ effects; the lack of a statistically significant effect for single-parent households only means that for a given level of absenteeism, TV-watching, and being-read-to, there is no (statistically significant) difference between one-parent and two-parent homes.

Punditus Maximus

Okay, I'm beyond amused that an insignificant but politically convenient variable was shoehorned in.

LL

So if I came from a one-parent family, I would understand this article?

Pat

Can someone please explain THAT to our Board of Education, as when the test scores are published in the newspaper after testing, we as teaching professionals get blamed for the low numbers in the Proficient or Advanced Proficient Range?? Then again, we are doing more of the "raising" of these children, at times, than their own parents.

Patrick

Pablito, I am interested in your train related health plan and wish to subscribe to your newsletter.

Pablito

I could also find an r square that is very high if I wanted to see the effect of ice cream sales on crime rates. I am amazed that such a report came out. I am guessing that if people don't really know anything about statistics you could fool them into thinking that jumping in front of moving trains can make them healthy.

Sloan

#2, I think it's equally unfair to assume that they are definitely not causal. Of course, correlation doesn't imply causation, but I still think the original report (assuming you take out the 4th variable which doesn't appear to hold water given this post) is worthwhile and does tell you something. That "something" is that those three variable should be tested to see if they actually do CAUSE the outcome. Thus, the benefit of the report is we may be moving in the right direction

Clive Warner

The problem with this is that people, including Mr Ayres, seem to assume the test (whatever test it is - that wasn't specified in the article) is valid.
I've written passages and test questions for the TOEFL and the SAT and I teach students how to pass some of these tests, such as the SAT, the GRE, and the GMAT.
And one thing is *very* obvious to me as the teacher: students arrive with a pathetic score, and after the end of a 16-hour course, (plus their homeworks and so forth), are able to achieve a massively higher test score.
When I qualified to teach these tests, I commented in surprise to my own teacher, "This ISN'T MATH! This is just tricks!" At which point my teacher laughed and said, "Now, finally, you're getting it!"
These tests in no way test the ability of students with either English or math. They merely test the test-taker's ability to learn how to jump through a series of hoops that have ETS stencilled on them. As such, relying on tests like these is simply ludicrous. A math savant might pass most of the math with ease, but would not have the slightest idea of original math, e.g. how to relate a chemical or statistical process to an equation.

If I understand the article correctly, the study focused on aggregated sets of 50 state data points. This means that the opportunity to examine in depth the variability of performance with respect to parameters possibly influencing performace was not taken, and any other factors that correlate well with these parameters might in fact be responsible for any of the observed results, such as they are. With the vast amount of available data, reducing the data to a mere 50 data points, reduced the analysis to a trivial numerical exercise.
Science Editor
www.polijam.com
Your Guide to News Around the Web

I have a question. Not being an economist or statistician, I'm assuming the question is stupid.

Let's say you have 5 students in your class. Each takes a nationwide test and scores well, about one standard deviation above the mean. Not statistically significant. But, as a group of five students, probably they are three or more standard deviations from the mean for groups of five, because of reversion to the mean.

How is this any different?

Pablito

#3 I completely understand what you are saying. My outrage comes from using only the r squared to make such a bold statement. Tests for statistical significance and tests for heteroskedasticity and linear dependence should be ran before making a statement as bold as those people did.

Mo

This report - along with many others - is an example of bad economics/statistics getting attention my the media because journalists don't have a clue. They just think that because someone said they used a regression it must be "scientific" and therefore correct.

The reality is that no academic economist usually even bothers to read these reports (except to see how bunky they are!). I read one recently (the topic doesn't matter) which got a whole news article on the website of a large cable channel based in Hotlanta which made all these amazing claims. BUT, the standard errrors were not even reported! She didn't even give me the opportunity to check significance which makes me very suspicious. Plus, I suspect the standard errors were not correctly adjusted for clustering.

Unfortunately, the public hears only the news article that "economists" have found...

It is so frustrating that decent, peer reviewed empirical work is so underappreciated and organizations with agendas produce this junk research by folks that are either 1) unethical or 2) grad school dropouts.

Susan

In epidemiology (and perhaps other fields) this is known as an ecologic correlation, meaning that it is not known whether the individuals with the putative risk factors are the same individuals who have the outcome--only the marginal (aggregate) data are known. While such studies may be suggestive, encouraging further (individual level) analyses, no causal conclusions can be drawn.

Outlying Data Point

mhw 12:53 -- many years ago one early Sunday evening on the Hong Kong MTR subway, we saw a Chinese woman who had cleaning supplies at her side. She was sitting with a young Chinese child, maybe five or six. The two were sharing an English language book, and she was guiding the child through it. We'd love to see this kid's test scores.

Ryan

Lets look at this from another perspective.

Lets assume that all races & religions are equally 'smart' genetically (because god forbid should anyone claim otherwise). Lets also assume that the relative differences (teaching/education wise) between a 25-30 year old recent college graduate with a teaching certificate from one state college versus another is minor.

That pretty much puts kids 'smarts' in terms of schools into a few categories based on cause:

a) some districts/states have pathetically poor administrations, policies, teaching materials & methods that somehow overwhelm the teacher's fundamental ability to teach the 3 R's, which haven't radically changed in say, 100 years

b) some teachers & schools 'teach' with the intent to pass the tests with high scores, while others teach what they believe is important or must teach, even if this doesn't help on standardized tests

c) parents are doing a horrible job parenting. their kids don't pay attention in class, don't study at home and have poor atmospheres in which to grow up

From what I've read and seen, 'C' seems to be a big factor. A few school districts in my area consistently rank in the top 2-10% nation wide with 90%+ graduation rates and 80%+ enrollment in secondary education. Other districts nearby loose their state accreditation and have

Ryan

---continued from post #16----

Other districts nearby have lost their state accreditation and have

Nick

I can see why the variable "Percent 1 Parent" doesn't add to a multiple regression model. Professor Ayres also says "there is not a strong association between the two variables in this particular regression data".

If however we look at the "Percent 1 Parent" in isolation against Actual Score surely there is a strong negative correlation (r squared = 46%), given us an indication that these two variables are to some extent related.

Sorry it's been a long time since I've done stats. so would just be interested to hear any comments about the data and it's findings

Thanks
Nick

miriam

This was said before... the oneparent household seems like it would be overwhelmingly associated with the other three variables...
I was surprised that the only variable that did not reach statistical significance was the "oneparent household". The whole read to your child garbage seems to be more of a marker of an involved parent than anything else. Not that there is anything wrong with a marker, I just didn't expect a correlation of a correlation to be robust.