Guest Blog: Vanishing Mailboxes, Underperforming Schools, Global Warming
We are very fortunate to get some incredibly interesting and perceptive mail from readers. Occasionally, we share these queries (like here and here). We also get some hardcore snark, and we sometimes share those too (like this recent one).
An e-mail that showed up the other day was so interesting that I wrote back to ask if I could simply post it as a guest blog. The author approved. His name is Paul Kimmelman, and he describes himself thusly: “I am a technical architect (I design hardware and software); but, my background is actually Science (Psycho-Biology as well as CIS). I am 47. I live in the San Francisco bay area (Alamo, CA).”
Printed below is Paul’s e-mail in full. It would have been nice, I guess, if we could have proferred some answers. But, as Andrew Marvel once lamented: “Had we but world enough, and time …” So here are Paul’s questions and musings, important and engaging at once; I am eager to see what you all make of them.
Hi guys. I know you guys have railed against confusing correlation with causality, and also discussed various statistical problems, but you do not seem to have addressed one that affects many real world decisions. I have noticed what appears to be an increase in decisions based on single variant analysis and averaging, often ignoring consequences of those decisions and/or the context of the analysis. It seems you ought to address these, as they are right up your alley. Some simple common examples:
1. The Post Office decided that labor costs of emptying post-boxes was too high. So, they decided to get rid of a bunch of them. They did a study of average fullness per box in a region and got rid of the X% least filled ones. The problem is that this kind of analysis is inherently flawed. For one, in areas with many boxes nearby, each will be less full than an ill-served area. So, they will get rid of *all* of them in one area. For example, you may have 5 boxes near each other that are each 20% full and one in a very separated area that is 100% full. By their statistical analysis, each of the 5 are poorly used and so all will be removed! (this is exactly what has happened by the way). Of course they should have included at least two more variables (geography and daily population density). But, the problem was also the poor use of statistics in making a decision.
2. Another example happening in the SF Bay Area involves schools. The District looks at average standardized test scores of each school and closes the ones with the lowest average. They do not consider whether these ones which are lower have the poorest kids with least parent involvement and most transience and the like. But, by closing the school and moving them to a much larger school, these low performing kids continue to be low performers, they just do not pull down the average as much. Worse, these low performing kids often do worse (at least according to a recent study at Stanford) and/or drop out. Dropping out helps the averages, so you could argue it is a good policy (Texas allegedly uses that to improve their scores), but the consequences are likely dire for society (more crime or welfare or both). The problem is that the poor performing school may in fact have those particular kids doing much better than they would in another school (due to extra attention, etc), but decision makers are taught that “statistics do not lie”. This is the opposite of some of your situations (where statistics show “common sense” to be wrong).
3. A 3rd and final one which covers many cases (including I would argue, some of your conclusions in your book) involves averaging and skewed conclusions it can lead to. Global warming is an interesting case. A lot of scoffing early on by politicians was due to the idea of worrying about a 1 degree rise in *average* temperatures. What was not obvious to many non-climatologists was that 1 degree average may mean 10 or more degrees lower in some areas and 10 or more higher in others (and that can lead to destruction of Agriculture, flooding, etc). The statistical number does not tell you very much. This is why economists rightly break up cost of living and inflation numbers into components that can cause swings (housing, oil, durable goods vs. consumables, etc). Even then, many do not consider that some of those numbers more heavily impact some sectors of the economy than others (poor, working class, middle, upper, etc) as well as some areas of the country more than others. This is why, for example, you got such outrage from the “people” when the Federal government was pointing out that the inflation rate was still reasonable in the Summer 2006 when gasoline/oil prices jumped up (but housing was coming down). Obviously a lot more people felt impacted by a big jump in Gasoline and did not feel ameliorated by the drop in housing prices.