SAT Strategy by Gender: Men Guess, Women Leave it Blank

(Digital Vision)

To guess or not to guess?  Most students wrestle with this question at least once during their multiple choice test-taking years. A new paper by Harvard economics grad student Katherine Baldiga examines whether men and women approach the issue differently. From the abstract:

In this paper, we present the results of an experiment that explores whether women skip more questions than men. The experimental test consists of practice questions from the World History and U.S. History SAT II subject tests; we vary the size of the penalty imposed for a wrong answer and the salience of the evaluative nature of the task. We find that when no penalty is assessed for a wrong answer, all test-takers answer every question. But, when there is a small penalty for wrong answers and the task is explicitly framed as an SAT, women answer significantly fewer questions than men. We see no differences in knowledge of the material or confidence in these test-takers, and differences in risk preferences fail to explain all of the observed gap. Because the gender gap exists only when the task is framed as an SAT, we argue that differences in competitive attitudes may drive the gender differences we observe. Finally, we show that, conditional on their knowledge of the material, test-takers who skip questions do significantly worse on our experimental test, putting women and more risk averse test-takers at a disadvantage.”

Baldiga’s results might help explain why women often do better in college than their SAT scores would have predicted and raise an important question: Are multiple-choice test scores the best way to fairly “measure aptitude and forecast future achievement”?  Readers, what do you think?  Are SAT tests gender-biased?  Of course, whether or not such gender differences are innate or cultural is a whole other research question.


(HT: Market Design)



In my opinion multiple-choice test are always less accurate (than say oral exams or essays, or just plain open questions) and definitely not the best way to measure aptitude or future performance (especially at an age where cognitive development is far from finished), however, it is the only way to be able to (1) check the answers of a large volume of both people and questions, (2) (practically) solicit questions to a large group of people, (3) to compare answers and results and (4) (you could see this as part of (3)) make sure that both the questions (oral exam) and answers (open questions) are the same between different participants.


To summarize: Yes, multiple-choice test scores the best way to fairly “measure aptitude and forecast future achievement” efficiently. Even if other systems might produce more accurate measures, most would cost more and lack the appearance of objectivity.


Fair enough, but you've added "fairly" and "efficiently" and I should have added "an individual's" between measure and aptitude. I thought I made that point with my point (1), "large volume of ... people".


On last years AP test college board removed wrong answer penalties because, " it made no difference in terms of final score one way or another".

What if the Gender differences are what accounts for that lack of improvement? if that's the case then yes the test is biased.

Corban Saezer

My father always told me that, even if I'm smart, if I don't work hard and take risks then I'll never get anywhere.

I took his lesson to heart.

Therefore, when I see smart people not taking risks, I treat them with the same contempt as I do for my past self. So it is for the men; so it is for the women.


That's a pretty low threshold for contempt.


If it is important in business and a career to be a good decision maker with incomplete information, then this kind of test is a good way to measure aptitude.

This might help explain differences in wages between genders. As a CEO, bad answers or decisions have penalties, but an answer or decision must be made.


Interesting thought!

A classic complaint about standardized tests is that they FAIL to measure the most important qualities, such as the ability to make a decision with imperfect information. People with skills at solving games and puzzles – that is, artificially constructed challenges that typically have an unambiguously right answer – are not the same people who have the skills at finding optimal strategies where there is no ambiguously right answer. The real world requires the latter aptitude much more often than the former. Yet, because standardized tests are supposed to be “objective,” they ask questions that are supposed to have one unambiguously right answer. By their structure, these tests necessarily focus on some types of aptitudes at the expense of other, arguably more important, aptitudes.

But now bruteostrich observes that these tests reward individuals who can make accurate guesses, even when the individual has inadequate information. Perhaps this quality partially offsets the deficit I noted above. I’m going to have to mull this one over….



Very interesting... But how about the other side of the coin: Anyone who watches the news knows that the Talking Heads love to take a little truth and run with it as far as possible. In the case of the SAT, it seems that men are more inclined to follow this path ("Well, I'm not so sure it is true, but I'm going to say it is") than women (who- giving the benefit of the doubt- might be thinking, "Well, I'm not so sure it is true, so I'm not going to say it is true")

Certainly both kinds of thinking are helpful at times. I know I often follow the "male model" of thinking, probably to a fault. After all, I think our society could use a few more people to admit "I don't know yet."

Mike B

The strategy of guessing vs not guessing is well defined on the SAT and there is no reason that every person taking it should not adopt that strategy. The SAT and tests like it get a lot of flack because they aren't the best proxy for intelligence or knowledge, but on the other hand a high stakes test is a good measure of performing under pressure and in this case, risk taking and strategic thinking.

The measure of a successful person not only comes from their knowledge, but also their personality. The reason that the best and brightest are not always the best compensated stems from the fact that you need other skills in addition to just knowledge or intelligence. People need to perform under pressure, be comfortable with risk and know how to think strategically. These are all things the SAT seems to test and we should encourage it. If women tend to do poorly on the SAT we shouldn't bemoan the fact that the test is bias, but instead find ways to get Women to do better. If women won't take calculated risks with guessing vs leaving something blank then who is to say they won't similarly fail to take proper risks in life. For example studies show that part of the wage gap comes from Women failing to ask for raises or promotions. If they learned to be more aggressive when studying for the SAT perhaps this problem could be mitigated.



While I think you raise some valid points in abstract, it's not too big a step from your opinions to blaming women themselves for their smaller wages. It's important for us to create a culture where workers (male or female) are equally rewarded for good performance regardless of whether they aggressively seek those rewards. Similarly, if a little thought will result in a differently presented SAT that more accurately measures aptitude, why not take that into account?

Mike B

Don't mistake the example as a defense of wage inequality, just how shrugging off something as "bias" could result in real world consequences later in life. As much as I would love to have a more meritocratic society unless everyone becomes autistic there is going to be a strong social component in almost everything humans do. I think that scholastic aptitude can be viewed to include such social components as performance under pressure and risk taking. A car can have a huge engine, but it's not going to go anywhere if it can't put its power down to the road. You can have a lot of knowledge aptitude, but if you can't ever apply it you might as well not exist.


I don't think this should make a big difference in test results. I am not familiar with the SATs (I am not an American), but in general, penalty marks for wrong answers in multiple choice tests are designed in such a way that random guessing will have zero net effect on the final score. For example, if there are 5 options per question and each correct answer carries 1 mark, then a wrong answer would have a penalty of -0.25.

I am a male, and I consider myself quite risk-averse. I also consider myself generally good in multiple-choice tests. My strategy when I don't know the answer to a question is to guess an answer only if I am reasonably sure that at least one of the options is wrong: this way, the expected value of my gain from the random (among the remaining options) guess is more than zero. Of course, some tests do not disclose what the penalty scheme for wrong answers is: but unless the test is also designed to screen risk-averse or risk-seeking traits, the penalty marking would always be designed this way.

Anyway, I don't think the apparent difference in random-guessing patterns of males and females will make much of a difference in multiple-choice tests. Males would not gain more marks by more random guessing; and females would not lose any marks by not doing random guessing.



Exactly the strategy to pursue. Look at the expected value of the guess. If you can eliminate enough answers as obviously wrong then the value of guessing among the remainder is greater than leaving it unanswered. This has always been my strategy.

Eric M. Jones.

Hey, not making a decision is a decision too! So what's the problem?

I wonder about the bias of the subjects since they all seemed to be headed for college. What about other groups like Arabs or Japanese?

I like Charlie Brown's T-F test strategy: The first question is T to start off optimistically, the second is F to break up the pattern. The third is F to break up the pattern, the fourth is true...etc.


Suppose there wasn't a gender effect. But, suppose humans have two personality types: type R, which likes to take risks, and type S, which likes things secure. Suppose type Rs tend to guess, and type Ss leave questions blank. So type Rs do better on the SAT than type Ss.

Is the SAT unfairly discriminatory against type Ss? What should we do about that?

Mike B

If Type R people tend to do much better in life and be extension school it would be rational for a school to select toward Type R applicants and moreover more rational for society to encourage people to adopt Type R behaviors.


But type R’s do better because our society is structured to reward risky behavior. Then you have to ask the question is it better for society to have type R’s in charge? Or should we do more to try and reward type S’s and type S behavior? I would argue that the financial sector, at least, would function better if more type S folks were making the decisions.

alex in chicago

Here is a simple contradictory postulate.

#1.Being able to read instructions on the test is another skill.
#2. Instructions contain the information that will tell you the penalty for wrong answers.
#3. Being able to understand that guessing will improve your performance is a simple calculation that even a fourth grader could do.

Therefore, not guessing is evidence of inferior knowledge or otherwise an inability to connect ones simple knowledge to a larger scheme.

Full Disclosure: 35 ACT, Five '5's on AP exams (5 taken), 170 LSAT.


Sure- I can recall being coached on the 'educated guess': if you can eliminate one or more of the answer choices as wrong, it pays to guess, etc.. This is most interesting- are those who are failing to recognize that a guessing strategy can improve your score victims of bias, or is the test recognizing and rewarding those who do- or those who use guessing without the knowledge of it's advantage, for that matter? Is this a skill we're trying to identify as aptitude, or is it an unfortunate skew from this type of testing?

caleb b

While I understand the necessity of standardized tests for college admission, a major problem with the process is that the preparation for the exam is not standardized.

Some kids, from middle and upper class households, receive tremendous test preparation which can cost hundreds (or even thousands) of dollars. Children from lower class households, generally, cannot afford such a luxury. So Richie Rich goes into the test having prepared for weeks for every type of question that could possibly be asked, while Leon Latchkey takes the test cold. Obviously, Leon isn't going to score as high as Richie.

Richie now receives a full-ride merit scholarship based on his score, while Leon does not. Sure, Leon probably qualifies for a Pell Grant, but a Pell Grant only covers basic tuition. Therefore, Leon, who was already at a disadvantage, continues to be at a disadvantage once in college.

Granted, the issue is more complicated than just that, but current standardized test ignore this aspect.

Full Disclosure: HS GPA 3.2, ACT 25 No Test Prep, Pell Grant – no other scholarship, College GPA 4.0


Joshua Northey

Yes but those same factors already exist and confound the whole process anyway. If you adjusted test scores for background, then someone with an adjusted test score would need to go through remedial work first because while perhaps naturally more talented, they are at present less prepared. Don't get me wrong we make horrible use of our human resources in this country, but I feel like standardized tests are our friends not our foes.

I n college I worked as a tutor at my University's minority resource learning center and we would have extremely bright migrant kids from say Sub-Saharan Africa, or bright but disadvantaged kids from the inter city US. They were not well served by being "adjusted" into college. What most of them needed was to go back to 4th grade. Now granted since they were bright and older they could probably complete the 4th grade in a couple weeks, but that was where they should have been. Instead we sort of shuffled them through the easiest set of courses we could find and invested a huge amount of tutoring resources in them. Many dropped out anyway. It seemed like a huge waste for a couple of feel good stories to me.

Full Disclosure HS GPA: 2.3, ACT: 36 (took 2 years early), No Test Prep, College GPA 3.7, Also had only Pell grants in school.

I came from a pretty crummy background (poor side of town, single parent who was passed out drunk or AWOL most days) and didn't get into any good schools even with the great test scores. Even somewhere like Reed doesn't want perfect test scores and a bad attitude. So I ended up going to the U of MN, which was surprisingly good.


Ian M

Man are more likely to guess = gender bias in testing? How so?

The penalty for guessing is often 1/4 of a mark for a 5 choice question. One should ALWAYS guess if they are confident that at least one of the answers is certainly wrong.

Jeff F

Actually, Ian, you should always guess if you plan on taking the test more than once. If you take it twice, guessing will create a higher standard deviation to your tests, and most schools only look at the highest score you get (at least that is what schools I applied to stated). Yes, you may get a lower score by guessing, but probabilistically the highest score you get will be higher (if you take it more than once).


Joshua Northey

Men are less risk averse than women? You don't say...

Coming up at 6 o'clock: Water is Wet a riveting expose'!

Joe Dokes

As a male, a teacher, and the father of two girls I have mixed feelings about this research. On the one hand, the fact that it appears that girls do not guess and thus cause their scores to suffer is simply a matter of proper test preparation or simply a change in a test making.

As a teacher it is clear that it should be trivial to teach people to guess. The key is when you can eliminate at least one clearly wrong answer, it is in your best interest to guess. Having taken hundreds of multiple choice tests over the years, I can say with certainty that I was always able to eliminate at least one obviously incorrect answer. As a result it is always better to guess. Always.

As a father, it is my responsibility to teach my girls to step up and take risks. It is also my responsibility to ensure that they have the skills necessary to be successful. I also want a fair and level playing field on which they will compete.

As a male who has watched what I sometimes perceive as a systematic attempt to emasculate males in education I can say that I don't desire to see any further advantages given to women. Currently women make up 58% of college freshman. It has become clear to me that virtually all changes made to education over the past thirty years have been primarily to the benefit of female students.


Joe Dokes



"It has become clear to me that virtually all changes made to education over the past thirty years have been primarily to the benefit of female students."

That's the nature of change in institutions -- people get disadvantages because the system, as is, disadvantages them somehow. Therefore, change tends to accrue to the benefit of the disadvantaged. Whether that's deserved is a different question, but if you don't believe that women were disadvantaged (in education or generally) thirty years ago, then you weren't paying attention.

Brian Gulino

What an interesting paper Ms. Baldiga wrote. How small a penalty for a wrong guess? If a correct answer is worth 4 points and you are choosing among 3 answers, a 1 point penalty would reward guessing and a 2 point penalty would penalize guessing. Very odd that this is not made explicit in the abstract and that none of the commenters mentioned it.

Since the penalty determines the strategy, I read the paper where I found this incorrect analysis of test taking strategies:

For instance, on the SAT, a long-time staple of college admissions in many countries, answering a multiple-choice question always yields a weakly positive expected value. There are five possible answers; one point is given for a correct answer, 1/4 of a point is lost for an incorrect answer, and no points are awarded for a skipped question.

Ms. Baldiga incorrectly concludes:

Even when he is unable to eliminate any of the possible answers, a risk neutral test-taker maximizes his expected score by answering the question.

Well, no. Random guesses on a hundred question test with 5 choices per question yields (on average) 20 correct answers for 20 points minus 80 errors at 1/4 point penalty for -20 points for a net of zero points. Guessing on SAT's is risk neutral.

Ms. Baldiga cites various studies of gender specific behavior on tests. Its remarkable how disinterested the test givers are in optimal guessing strategies, a disinterest seemingly shared by Ms. Baldiga even though she's writing a paper about it. As a test taker, only recently a test giver, I have always been interested in optimal test taking strategies. For the SAT's my strategy was:

Go through the test answering all questions I am sure of. On the questions I wasn't sure of, lightly mark all the answers I knew were wrong on the answer sheet. Use the remaining time to revisit those questions. In the last minute of the test, guess among all the answers that were not lightly marked.


Jeffrey L.

Yes, they do make that incorrect conclusion in the introduction, but their version of the SAT that they gave the students only had 4 questions, but there was still only a 1/4 point penalty. So even though the 5-point SAT is risk neutral, this test was rewarding guessing.

Meanwhile, the words they used to frame it as an SAT said that it would be "scored like an SAT." I took the ACT, and the only things I know about the difference is that SAT uses bigger numbers to say the score, and that you are penalized for guessing. Were women less likely to go, "Oh, they mean 'similar' but there is a subtle difference that puts guessing in my favor!"

But even though men guessed more, they scored the same as the women. Are men worse at guessing? Or was the test not long enough to make guessing matter? I liked the pilot session where there were the 4 questions, then a 5th that said, "I'm not sure, but I would guess ____." Women used that answer a lot more than the men. Without considering the guesses, men did a bit better, but when the guesses were factored in they scored the same.

Probably need a bigger test, but I want to say that these women are either better guessers, or are just less sure of themselves even when they are right. I think there's a moral somewhere here to take away from this.



If they were to increase the cost of a wrong answer, might we not see female students converging with, or overtaking, the male students again? Perhaps the negative marking at the moment is not negative enough. The cost is enough to deter risk-taking among the women, but not enough to punish risk-taking by the men?


People have already done research showing High School GPA is the best to “measure aptitude and forecast future achievement”


I thought the different SAT taking strategies of boys and girls and their willingness to guess was already known. I can remember this "fact" being thrown around in my SAT prep classes in the 90s.


I attended a high school in a foreign country. A significant number of our students have taken the SAT tests and consequently attended US universities throughout the years. Though the school doesn't keep statistical data, from what teachers and counselors have shared, overall women here perform better. I was surprised when I read statistical data which stated that women in the US performed better in the verbal section and men in the math section, because in our case it was usually women performing slightly better on both sections. Anyway, we were all taught to never leave a blank answer, despite the penalty points. Of course, I don't have enough data to claim that, but do you think the differences may have something to do with the type of educational system at hand, teachers' training and advice etc.? Maybe even parents' expectations play a role.

I think multiple-choice tests are not the best indicator of later success but they are fairly accurate and probably the best we have at hand now.


robin marlowe

Woops, upon second thought almost all. As to the issue at hand, I once took a standardized test in "social studies." The thing is, I never took a college course in the subject. Bought a book, made educated guesses. That's what someone suggested that I do. Low and behold- I did well- without any formal training. So women can be taught to behave otherwise than they would. Does this not mean that men can learn as well. I do have reason to think so.