SAT Strategy by Gender: Men Guess, Women Leave it Blank

To guess or not to guess?  Most students wrestle with this question at least once during their multiple choice test-taking years. A new paper by Harvard economics grad student Katherine Baldiga examines whether men and women approach the issue differently. From the abstract:

In this paper, we present the results of an experiment that explores whether women skip more questions than men. The experimental test consists of practice questions from the World History and U.S. History SAT II subject tests; we vary the size of the penalty imposed for a wrong answer and the salience of the evaluative nature of the task. We find that when no penalty is assessed for a wrong answer, all test-takers answer every question. But, when there is a small penalty for wrong answers and the task is explicitly framed as an SAT, women answer significantly fewer questions than men. We see no differences in knowledge of the material or confidence in these test-takers, and differences in risk preferences fail to explain all of the observed gap. Because the gender gap exists only when the task is framed as an SAT, we argue that differences in competitive attitudes may drive the gender differences we observe. Finally, we show that, conditional on their knowledge of the material, test-takers who skip questions do significantly worse on our experimental test, putting women and more risk averse test-takers at a disadvantage.”

Baldiga’s results might help explain why women often do better in college than their SAT scores would have predicted and raise an important question: Are multiple-choice test scores the best way to fairly “measure aptitude and forecast future achievement”?  Readers, what do you think?  Are SAT tests gender-biased?  Of course, whether or not such gender differences are innate or cultural is a whole other research question.


  1. Nanno says:

    In my opinion multiple-choice test are always less accurate (than say oral exams or essays, or just plain open questions) and definitely not the best way to measure aptitude or future performance (especially at an age where cognitive development is far from finished), however, it is the only way to be able to (1) check the answers of a large volume of both people and questions, (2) (practically) solicit questions to a large group of people, (3) to compare answers and results and (4) (you could see this as part of (3)) make sure that both the questions (oral exam) and answers (open questions) are the same between different participants.

    • nobody.really says:

      To summarize: Yes, multiple-choice test scores the best way to fairly “measure aptitude and forecast future achievement” efficiently. Even if other systems might produce more accurate measures, most would cost more and lack the appearance of objectivity.

      • Nanno says:

        Fair enough, but you’ve added “fairly” and “efficiently” and I should have added “an individual’s” between measure and aptitude. I thought I made that point with my point (1), “large volume of … people”.

  2. Austin says:

    On last years AP test college board removed wrong answer penalties because, ” it made no difference in terms of final score one way or another”.

    What if the Gender differences are what accounts for that lack of improvement? if that’s the case then yes the test is biased.

  3. Corban Saezer says:

    Hidden due to low comment rating. Click here to see.

  4. bruteostrich says:

    If it is important in business and a career to be a good decision maker with incomplete information, then this kind of test is a good way to measure aptitude.

    This might help explain differences in wages between genders. As a CEO, bad answers or decisions have penalties, but an answer or decision must be made.

    • nobody.really says:

      Interesting thought!

      A classic complaint about standardized tests is that they FAIL to measure the most important qualities, such as the ability to make a decision with imperfect information. People with skills at solving games and puzzles – that is, artificially constructed challenges that typically have an unambiguously right answer – are not the same people who have the skills at finding optimal strategies where there is no ambiguously right answer. The real world requires the latter aptitude much more often than the former. Yet, because standardized tests are supposed to be “objective,” they ask questions that are supposed to have one unambiguously right answer. By their structure, these tests necessarily focus on some types of aptitudes at the expense of other, arguably more important, aptitudes.

      But now bruteostrich observes that these tests reward individuals who can make accurate guesses, even when the individual has inadequate information. Perhaps this quality partially offsets the deficit I noted above. I’m going to have to mull this one over….

      • Andrew says:

        Very interesting… But how about the other side of the coin: Anyone who watches the news knows that the Talking Heads love to take a little truth and run with it as far as possible. In the case of the SAT, it seems that men are more inclined to follow this path (“Well, I’m not so sure it is true, but I’m going to say it is”) than women (who- giving the benefit of the doubt- might be thinking, “Well, I’m not so sure it is true, so I’m not going to say it is true”)

        Certainly both kinds of thinking are helpful at times. I know I often follow the “male model” of thinking, probably to a fault. After all, I think our society could use a few more people to admit “I don’t know yet.”

        Thumb up 0 Thumb down 0
  5. Mike B says:

    The strategy of guessing vs not guessing is well defined on the SAT and there is no reason that every person taking it should not adopt that strategy. The SAT and tests like it get a lot of flack because they aren’t the best proxy for intelligence or knowledge, but on the other hand a high stakes test is a good measure of performing under pressure and in this case, risk taking and strategic thinking.

    The measure of a successful person not only comes from their knowledge, but also their personality. The reason that the best and brightest are not always the best compensated stems from the fact that you need other skills in addition to just knowledge or intelligence. People need to perform under pressure, be comfortable with risk and know how to think strategically. These are all things the SAT seems to test and we should encourage it. If women tend to do poorly on the SAT we shouldn’t bemoan the fact that the test is bias, but instead find ways to get Women to do better. If women won’t take calculated risks with guessing vs leaving something blank then who is to say they won’t similarly fail to take proper risks in life. For example studies show that part of the wage gap comes from Women failing to ask for raises or promotions. If they learned to be more aggressive when studying for the SAT perhaps this problem could be mitigated.

    • Greg says:

      While I think you raise some valid points in abstract, it’s not too big a step from your opinions to blaming women themselves for their smaller wages. It’s important for us to create a culture where workers (male or female) are equally rewarded for good performance regardless of whether they aggressively seek those rewards. Similarly, if a little thought will result in a differently presented SAT that more accurately measures aptitude, why not take that into account?

      • Mike B says:

        Don’t mistake the example as a defense of wage inequality, just how shrugging off something as “bias” could result in real world consequences later in life. As much as I would love to have a more meritocratic society unless everyone becomes autistic there is going to be a strong social component in almost everything humans do. I think that scholastic aptitude can be viewed to include such social components as performance under pressure and risk taking. A car can have a huge engine, but it’s not going to go anywhere if it can’t put its power down to the road. You can have a lot of knowledge aptitude, but if you can’t ever apply it you might as well not exist.

  6. Aurorion says:

    I don’t think this should make a big difference in test results. I am not familiar with the SATs (I am not an American), but in general, penalty marks for wrong answers in multiple choice tests are designed in such a way that random guessing will have zero net effect on the final score. For example, if there are 5 options per question and each correct answer carries 1 mark, then a wrong answer would have a penalty of -0.25.

    I am a male, and I consider myself quite risk-averse. I also consider myself generally good in multiple-choice tests. My strategy when I don’t know the answer to a question is to guess an answer only if I am reasonably sure that at least one of the options is wrong: this way, the expected value of my gain from the random (among the remaining options) guess is more than zero. Of course, some tests do not disclose what the penalty scheme for wrong answers is: but unless the test is also designed to screen risk-averse or risk-seeking traits, the penalty marking would always be designed this way.

    Anyway, I don’t think the apparent difference in random-guessing patterns of males and females will make much of a difference in multiple-choice tests. Males would not gain more marks by more random guessing; and females would not lose any marks by not doing random guessing.

    • Rick says:

      Exactly the strategy to pursue. Look at the expected value of the guess. If you can eliminate enough answers as obviously wrong then the value of guessing among the remainder is greater than leaving it unanswered. This has always been my strategy.

  7. Eric M. Jones. says:

    Hey, not making a decision is a decision too! So what’s the problem?

    I wonder about the bias of the subjects since they all seemed to be headed for college. What about other groups like Arabs or Japanese?

    I like Charlie Brown’s T-F test strategy: The first question is T to start off optimistically, the second is F to break up the pattern. The third is F to break up the pattern, the fourth is true…etc.

  8. Phil says:

    Suppose there wasn’t a gender effect. But, suppose humans have two personality types: type R, which likes to take risks, and type S, which likes things secure. Suppose type Rs tend to guess, and type Ss leave questions blank. So type Rs do better on the SAT than type Ss.

    Is the SAT unfairly discriminatory against type Ss? What should we do about that?

    • Mike B says:

      If Type R people tend to do much better in life and be extension school it would be rational for a school to select toward Type R applicants and moreover more rational for society to encourage people to adopt Type R behaviors.

      • Clancy says:

        But type R’s do better because our society is structured to reward risky behavior. Then you have to ask the question is it better for society to have type R’s in charge? Or should we do more to try and reward type S’s and type S behavior? I would argue that the financial sector, at least, would function better if more type S folks were making the decisions.

