Another Case of Teacher Cheating, or Is It Just Altruism?

From the results of the high-school “maturity exam” in Poland (courtesy of reader Artur Janc), comes this histogram showing the distribution of scores for the required Polish language test, which is the only subject that all students are required to take, and pass.

Not quite a normal distribution. The dip and spike that occurs at around 21 points just happens to coincide with the cut-off score for passing the exam. Poland employs a fairly elaborate system to avoid bias and grade inflation: removing students’ names from the exams, distributing them to thousands of teachers and graders across the country, employing a well-defined key to determine grades. But by the looks of these results, there’s clearly some sort of bias going on.

Compare that to the results of the “advanced” Polish language exam, which is taken in addition to the basic level exam by about 10% of students. It has no influence on whether students pass or fail the exit examination, so there’s no incentive to grade inflate, as evidenced by the clean distribution.


Artur writes:

I’m quite sure there is nothing to be gained for the graders/districts if they pass a student with a borderline score (at the basic level), rather than failing him/her. So my take on this is that graders just didn’t want to fail some kids and seriously hurt their college prospects and/or make them re-take the exam when the score was close to the cutoff.

So, is that pure altruism on the part of the teachers? Or do they actually have some bit of national incentive to see students go on to college? One could probably ask the same questions of school officials in Atlanta.



As someone who marks university tests, the cutoff-score bump is a normal thing for us but is normally hidden by low sample sizes I suspect, but we see it too! It's caused by teachers and students "gaming" the course feedback system.

When we mark an anonymous test in my university, the academic's KPI is for a minimum fraction of students pass - after all, if we fail most of the class, we must have taught poorly, right? The second KPI is that our mean student "course satisfaction" from an end-term (pre-finals) survey is above some minimum.

When we come across a student who has just "not quite made it," a combination of sympathy for the student and wanting to boost one's own teaching KPIs encourages nearly all of us to "find that extra mark" for them. If it's a mid-semester test, then their passes and fails will feed into their survey results too - you'd be angry with your teacher if they failed you by one lousy mark, right?

This is an interesting case of a backfiring incentive system, a favourite Freakonomics topic of mine. Nearly all Australian universities now implement a survey system whereby student satisfaction surveys are a KPI for teachers. The students "game" the system by voting down subjects they feel they've been harshly marked on rather than whether they thought the teaching style was any good, because it means in future years their subjects will get easier as teachers are told by administrators "our fee-paying customers say you're too harsh on them". This is happening very effectively. The teachers in turn "game" the system by bowing to the dumbing down of course material and secondly by bumping up their pass rates by finding that extra mark for students. It's easier to game the system than to teach better when your teaching prep-time hours are capped by the accountants. Thus, the feedback system meant to continuously improve education is making it worse, whoops.

So there you go, that funny bump has an interesting story behind it!



There's a simple solution to the problem of grade inflation. Put the courses on a curve. Perhaps the curve can be correlated with the incoming credentials of the class as some classes can have some variation in the quality of students if they are small or offered at different times of the day or year. That way you would tease out the influence that a student's expected grade would have on his/her evaluation of the professor.

Or perhaps teaching evaluations can be adjusted for the mean or median grade given by the professor, or the mean/median grade in context of the incoming qualities of the students.

The problem with both approaches is that they discount good teaching since good teaching often engages the students more and might make them perform at a higher level, and thus deserve better grades. I like the anonymous grading system present with standardized tests (e.g. high school AP or IB tests, or the Polish system described above) as it doesn't artificially constrain student achievement to a curve, and thus can measure overall increases in student performance.


Enter your name...

I wonder how many graders touch each student's test. I think you would see less of this problem if you had many small chunks, so that they graders knew that their score wasn't likely to make a big difference in the outcome. It's easier to give a low score when you believe that it won't matter much, because the borderline work in front of you could be an anomaly, the worst out of an otherwise decent set of work. It's hardest when you know that all the work was borderline.