Guest Blog: Vanishing Mailboxes, Underperforming Schools, Global Warming

We are very fortunate to get some incredibly interesting and perceptive mail from readers. Occasionally, we share these queries (like here and here). We also get some hardcore snark, and we sometimes share those too (like this recent one).

An e-mail that showed up the other day was so interesting that I wrote back to ask if I could simply post it as a guest blog. The author approved. His name is Paul Kimmelman, and he describes himself thusly: “I am a technical architect (I design hardware and software); but, my background is actually Science (Psycho-Biology as well as CIS). I am 47. I live in the San Francisco bay area (Alamo, CA).”

Printed below is Paul’s e-mail in full. It would have been nice, I guess, if we could have proferred some answers. But, as Andrew Marvel once lamented: “Had we but world enough, and time …” So here are Paul’s questions and musings, important and engaging at once; I am eager to see what you all make of them.

Hi guys. I know you guys have railed against confusing correlation with causality, and also discussed various statistical problems, but you do not seem to have addressed one that affects many real world decisions. I have noticed what appears to be an increase in decisions based on single variant analysis and averaging, often ignoring consequences of those decisions and/or the context of the analysis. It seems you ought to address these, as they are right up your alley. Some simple common examples:

1. The Post Office decided that labor costs of emptying post-boxes was too high. So, they decided to get rid of a bunch of them. They did a study of average fullness per box in a region and got rid of the X% least filled ones. The problem is that this kind of analysis is inherently flawed. For one, in areas with many boxes nearby, each will be less full than an ill-served area. So, they will get rid of *all* of them in one area. For example, you may have 5 boxes near each other that are each 20% full and one in a very separated area that is 100% full. By their statistical analysis, each of the 5 are poorly used and so all will be removed! (this is exactly what has happened by the way). Of course they should have included at least two more variables (geography and daily population density). But, the problem was also the poor use of statistics in making a decision.

2. Another example happening in the SF Bay Area involves schools. The District looks at average standardized test scores of each school and closes the ones with the lowest average. They do not consider whether these ones which are lower have the poorest kids with least parent involvement and most transience and the like. But, by closing the school and moving them to a much larger school, these low performing kids continue to be low performers, they just do not pull down the average as much. Worse, these low performing kids often do worse (at least according to a recent study at Stanford) and/or drop out. Dropping out helps the averages, so you could argue it is a good policy (Texas allegedly uses that to improve their scores), but the consequences are likely dire for society (more crime or welfare or both). The problem is that the poor performing school may in fact have those particular kids doing much better than they would in another school (due to extra attention, etc), but decision makers are taught that “statistics do not lie”. This is the opposite of some of your situations (where statistics show “common sense” to be wrong).

3. A 3rd and final one which covers many cases (including I would argue, some of your conclusions in your book) involves averaging and skewed conclusions it can lead to. Global warming is an interesting case. A lot of scoffing early on by politicians was due to the idea of worrying about a 1 degree rise in *average* temperatures. What was not obvious to many non-climatologists was that 1 degree average may mean 10 or more degrees lower in some areas and 10 or more higher in others (and that can lead to destruction of Agriculture, flooding, etc). The statistical number does not tell you very much. This is why economists rightly break up cost of living and inflation numbers into components that can cause swings (housing, oil, durable goods vs. consumables, etc). Even then, many do not consider that some of those numbers more heavily impact some sectors of the economy than others (poor, working class, middle, upper, etc) as well as some areas of the country more than others. This is why, for example, you got such outrage from the “people” when the Federal government was pointing out that the inflation rate was still reasonable in the Summer 2006 when gasoline/oil prices jumped up (but housing was coming down). Obviously a lot more people felt impacted by a big jump in Gasoline and did not feel ameliorated by the drop in housing prices.


psteinx

Sorry to be a cynic, but there's so many things in the main post and comments I disagree with:

1) One should make a distinction between statistics designed for widespread/public consumption, and those designed for specialists analyzing a problem. For the former, the metrics have to be simple for most things - the public won't readily digest complexity. But I'd still argue that a single imprecise statistic is better than nothing at all, or bland general statements. Does this mean we should never look past single-number statistics? Of course not, but they're a starting point.

Which school students may have troubled home lives? Start with absenteeism (a single number). Students with high absentee rates are potentially experiencing problems at home. Not all of them are (some may just be experiencing abnormal sicknesses), and not all students attending school are problem-free at home, but you can use such data as a starting point.

Average temperatures ARE a good way to get a general handle on the global warming future. Yes, there are far more sophisticated explanations and models, but most of the public won't read a 400 page textbook on the subject. There are many items fighting for the public's attention - summarize your issue concisely (but accurately) if you want the public to notice.

To quote Einstein, "Make everything as simple as possible, but not simpler."

The post office box issue is a different problem - the correct solution would have been to print out a map of the area, with boxes shown and color-keyed by utilization rate. Then a human could have rationally culled things. It is quite difficult to create good algorithms for such things, but fairly easy for a human to eyeball them and get reasonably good solutions (this is sort of related to the "traveling salesman" problem).

As for the comment on people overeating "low fat foods" - even if people do overeat them, they may still be valuable. You are only considering one metric yourself (end-result obesity). But there are really two factors at work - the pleasure of eating desserts, and the resulting obesity. If I can now eat two cookies instead of one and end up at the same weight, I would argue that I have benefited.

Finally, schools are much more than a bunch of brick and mortar, and it CAN be quite difficult to change the teachers and whatnot. Perhaps it should be easier to make personell changes and whatnot, but I'm not holding my breath for that.

Read more...

pvanderwaart

I blame Excel. Everyone with Excel on his computer thinks he's an analyst.

Once upon a time, I worked in the management science department of a large consumer products company. Every now and then, we would be asked to help with promotion analysis, and we would explain our methodology which can be summarized as "percent better/worse than forecast, net of control." A few days later, we'd get the question "What if there is no control?"

At the advent of the personal computer age, the management science department got conflated with the PC support, and subsequently disappeared. (This happened in many similar companies as well.) A lot of the "analysis" is now done on the PC's of MBA with minimal number sense or knowledge of statistics.

bertrecords

Excellent post. Three great case studies.

egretman

No Child Left Behind. Yes, a true success story. Brings a tear to the eye.

The only regret that you will ever hear a Texan voice is that Bush left Texas to be President before he could perfect our health care system

mathking

Great post. Very well thought out. I may have my statistics students go to the blog to read it.

Ike Pigott

Agreed, an excellent guest entry.

And I agree on the confusion over "average temperatures," but the analysis doesn't go far enough. Too many assumptions are grounded in a static interpretation of climate. "Things in Nowhereistan are the way they are today, because they have always been that way." Under those assumptions, 10-degree swings seem not only dramatic and drastic, but also unnatural. We still haven't done a good enough job determining how much elasticity there is in the pre-human climate to gauge our real impact.

jonathank

This phenomenon has a name: innumeracy. That name implies something is wrong with people who don't understand numbers when reality is that few people do understand them and that, not the absence, should be the labeled condition.

meomaxy

If you invent a metric that you consider to be *the* measure of your success, and then adjust your strategy to maximize the metric, it is easy to see how the outcome you end up with looks good on your metric but not so good in terms of what you actually wanted to achieve.

The post box example is good, because probably the utilization of the remaining boxes did go up, despite the fact that what they supposedly were interested in was balancing their cost to collect from the boxes with minimizing the distance postal customers had to go to reach a box.

Another example: Americans, we hear, are getting too fat. This is mainly because they eat too many calories and don't exercise enough of them off. Somebody notices that a lot of the calories in our diet come from foods that are high in fat. Cut out many foods with a lot of fat, and sort of automatically a large number of calories get cut out too. Avoiding fat becomes an effective strategy for losing weight. But then a whole industry pops up with low fat versions of foods like toffee chocolate banana muffins with gumdrops on top. Gradually fat calories are replaced by sugar and other non-fat things and nobody loses weight anymore because it becomes too easy to eat high-calorie junk all day that has hardly any fat in it. Then comes Atkins, and the pendulumn swings in the other direction to the point where again you can eat high-calorie junk all day, but this time it has almost no carbohydrates in it.

That's also the problem with testing in schools. If you live and die by average scores, the policies you enact will tend to lift average scores, whether or not this helps the most students.

Couldn't many of these decisions be made better if there was an attempt to incorporate market forces into the decision-making?

In the case of schools, that's one argument in favor of school choice. If happy parents are a better indicator of good schools than average test scores, then giving the parents the choice to pick the school they think is best is likely to reward the good schools and punish the bad ones more effectively than the current policy.

Is there a way to apply these market forces to mail boxes? Here's a thought experiment. What would happen if the post office authorized private companies to put out "yellow boxes" that they would then collect from and then get paid some bounty based on the amount of mail they collected. The private companies would then be free to choose locations and pickup frequencies as they want, anything they think would entice more people to choose yellow over blue.

Is this analogous to when private companies were allowed to put up public pay phones? Was that a success?

Read more...

pparkmanlg

I read this excellent post as an indictment of reductionism. Reductionism has been the basis of most of the scientific advancements of the last several centuries, but we are reaching the point where far-sighted individuals such as this poster are seeing the limitations of that mindset.
Everything is connected. Now we have to build the analytic tools to understand the bigger picture after hundreds of years of building tools for looking at the details. I predict some amount of obstruction to changing the paradigm.

egretman

...giving the parents the choice to pick the school they think is best is likely to reward the good schools and punish the bad ones...

What does it mean that schools are bad? Or good? Schools are brick and mortar. They have no moral stake in this game.

Just say what you mean. The parents are bad? The students are bad? Or the teachers are bad?

The latter is the easiest to fix. And even mediocre teachers would be delighted to teach sweet young things anxious to absorb the knowledge from the greeks to Einstein.

The students cannot be fixed. At a certain stage they are lost. A few will reform but the broad demographic will never be engineers or poets.

The Parents would require a change in American culture to fix. That could be generations.

No, the majority of American children will continue to be uneducated in the traditional sense. Maybe some of you economic types could come up with reasons why that's not such a bad thing. Afterall, isn't the market driving this phenomenon?

Read more...

Inkling229

Great post.

To add to the confusion of the under-performing school is a trend my mother (who started teaching at one such institution two years ago) has noticed. These under-performing schools serve as catch basins for the others. Thus, the "better" schools expel their failing students and send them onto the "worse" schools. There is a major surge in this practice in the month or so before a high-stakes test. Thus, the under-performing school's average is further lowered by the fact that it is required to test students who arrived at the school a few weeks before. Hardly an indication of instruction quality...

meomaxy

Clarifying my point about low fat foods: I was referring to the fact that the food industry came up with ways to create substitute foods that contained drastically less fat while containing virtually the same number of calories.

This point is elaborated here: http://www.pbs.org/wgbh/pages/frontline/shows/diet/themes/lowfat.html

The story is like this: I want to lose weight, so following some advice I read somewhere, I decide to reduce the amount of fat in my diet. I read the label on my brownie and see that a 60g serving has 10g of fat in it, so I say, "Whoa! That's a lot. I'm only going to eat half a brownie." I then find Betty Crocker Low Fat Fudge brownies and read the label and see that 60g serving has only 5g of fat. "Great!" I say "I can eat a whole one of those and still only have the same 5g of fat. Yippee!" Only problem is that both brownies actually have 240 calories in them. Now I'm eating an extra 120 calories by substituting a low fat brownie for a regular one.

And my point was not actually about diet but using this as an example for how you can end up worse off by focusing on an oversimplified metric.

Read more...

Lea

Three things:

1) Great guest post! Good points in it. Well organized thoughts.

2) Comment #8 by pparkmanlg is right on the money. Short comment. Concise. Interesting :)

3) A joke one of my nutty econ profs once told us: Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. The third statistician didn't fire, but shouted in triumph, "On the average we got it!"

hhadzimu

Excellent post. It reminds me of the arguments I used to have with my parents when I was younger.
"Teenagers cause x% of all accidents."
"X% of crime happens after midnight."
Great. We'll stop driving and being out at night, but then some other age group and some other time of day will be the worst.

JohnR

Paul Kimmelman's comments remind me of the statistical idiocy that often leads to multi-billion dollar investments in the petrochemical industry - the industry I provide information services for. Visit my blog - http://www.icis.com/blogs/asian-chemical-connections/
You not only need to understand the basis for a statistic, ie, in the case of the post boxes have location and daily population density been taken into account, but also the motives behind the statistics supplier. For example, the consultants for numerous projects built before the Asian financial crisis in 1997 were also providing services to banks if IPOs for those project went ahead (durrrr, go figure).
Plus supply and demand forecasts, the justification for investments, are invariably wrong because either they have been compiled on too shoddy a basis or it is simply impossible to predict the future.
There is also a kind of collective positive hysteria about growth in certain markets, eg, India and China. The result is that nobody dares challenge collective wisdom, which pushes forecasters into over-bullish one-sided predictions.

Read more...

Wendy

Greetings!..y

Odette

I browse and saw you website and I found it very interesting.Thank you for the good work, greetings

Laura

Lucky to find you, keep on the good workk guys! Best of luck..

Teddy

Cool!.. Nice work...D

ahtme.kk@mail.ee

calculator