Challenging the Bing-It-On Challenge

Did you find this blog post through Bing?  Probably not — 67% of worldwide searches go through Google, 18% through Bing.  But Microsoft has advertised in a substantial TV campaign that — in the cyber analog to blind taste-testing — people prefer Bing “nearly 2:1.”  A year ago, when I first saw these ads,  the 2-1 claim seemed implausible.  I would have thought the search results of these competitors would be largely identical, and that it would be hard for people to distinguish between the two sets of results, much less prefer one kind 2:1.

When I looked into the claim a bit more, I was slightly annoyed to learn that the “nearly 2:1” claim is based on a study of just 1,000 participants.  To be sure, I’ve often published studies with similarly small data sets, but it’s a little cheeky for Microsoft to base what might be a multi-million dollar advertising campaign on what I’m guessing is a low-six-figure study. 

To make matters worse, Microsoft has refused to release the results of its comparison website,  More than 5 million people have taken the Bing-It-On challenge – which is the cyber analog to a blind taste test.  You enter in a search term and the Bing-It-On site return two panels with de-identified Bing and Google results (randomly placed on the right or left side of the screen).  You tell the site which side’s results you prefer and after five searches the site reveals whether you prefer Bing or Google (see below).

Microsoft’s ads encourage users to join the millions of people who have taken the challenge, but it will not reveal whether the results of the millions are consistent with the results of the 1,000.


So together with four Yale Law students, I set up a similar-sized experiment using Microsoft’s own site to see which search engine users prefer.  We found that, to the contrary of Microsoft’s claim, 53 percent of subjects preferred Google and 41 percent Bing (6 percent of results were “ties”).  This is not even close to the advertised claim that people prefer Bing “nearly two-to-one.”  It is misleading to have advertisements that say people prefer Bing 2:1 and also say join the millions of people who’ve taken the Bing-It-On challenge, if, as in our study, the millions of people haven’t preferred Bing at a nearly a 2:1 rate.  Microsoft might have realized this and has more recently altered its advertising to back off their original claim to just say that people “prefer” Bing.

We also interjected a bit of randomness into our study to test whether the type of search term impacts the likelihood that Bing is preferred.  We randomly assigned participants to search for one of three kinds of keywords: Bing’s suggested search terms, popular search terms, and self-suggested search terms.  When Bing-suggested search terms were used the two engines statistically tied (47% preferring Bing vs. 48% preferring Google).  But when the subjects in the study suggested their own searches or used the web’s most popular searches, a sizable gap appeared: 55-57% preferred Google while only 35-39% preferred Bing.  These secondary tests indicate that Microsoft selected suggested search words that it knew were more likely to produce Bing-preferring results. You can read our full paper here.

The upshot: several of Microsoft’s claims are a little fishy.  Or to put the conclusion more formally, we think that Google has a colorable deceptive advertising claim against Microsoft.  It could be worth a lot of money on lost ad revenue if the claims misled people into thinking that a substantial majority of people prefer Bing over Google.  Then again, I might be a little over zealous in seeing Lanham Act violations.  Back in 2010, I suggested that the movie Date Night might contain a deceptive Kindle ad.


I think it's telling that I used google to search for the bingiton challenge.


I'm not sure but I think the Bing-It-On challenge is the cyber analog to a blind taste test.


A little-known fact but one which the Microsoft folks clearly didn't research (Google?) before picking a name for their search: in northern-English dialect, 'bing' is the accepted name for a waste heap, some of which can the scale of small mountains.

j fre

Did you find this blog post through Bing? Probably not —
wow start off sounding like pompous ass. By the way most advertising is deceptive.


i prefer duckduckgo at the moment!

David Leppik

...which is, last I heard, powered by Bing.

Plutonar VS

Mainly Bing and Blekko, boosted by special features. Bing in turn has (or had) a deal with Wolfram Alpha.


I have wondered if this claim was misleading as Google still seemed to be the browser everywhere. I took the Bing It On challenge and chose Google 5 out of 5 times. In one of the searches I used the name of a relatively young non-profit in NY called It Could Happen To You Inc. In the Google results the top results linked to the organizations website (although I did not get the general website link, I did get content pages on from the website). In Bing the closest I got to the organization was its listing on a non-profit listing site. Even when I used a celebrity name as my search Google showed the official site first while Bing showed wiki and photos. The commercials are very misleading and I wonder if this meets the level of false advertising which may be why Microsoft refuses to release their findings.


I can't even do the experiment because I'm in Mexico and the "bingiton" page send me to Bing Mexico, and limit all my searches to Spanish.
I hate Bing and I always will.


You'll always hate Bing no matter what. Wow, that's a lot of hate you have going on there for Bing.


I just tried the Bingiton challenge, which said I prefer Bing, and then I used Bing to pull up the exact same search. The pages didn't even match! For instance, I searched News, which gave me an in-depth news coverage section, which Bing doesn't have. That was a Google feature.

Martin Beeby (Microsoft)

Bing has a news section. It can be found at the top of the home page, or at or if the news is recent and relevant to your search it may appear along side your search results.


For myself, it depends on what I'm doing. For a general search on a strange computer, I'd use Bing, simply because it doesn't have that horrible flickering autocomplete thing. And Bing maps are better than Google's, at least for finding backcountry dirt roads. But it doesn't have things like Google Scholar, or (at least that I've found) a good advanced search mechanism.

Ken Bellows

Another major problem with is that it restricts Google's results to the central pane of the page. Some of the most interesting results (related products on ebay, amazon, etc., related results from google maps, ...) show up in a side pane that is completely cut out of the Bing It On version of the page, while Bing's fancy extras are included in the same page. This falsely implies that Bing gives more useful results when it doesn't.

For an example, try "lawn mower" in and Then try "Yosemite National Park".


Bing has additional result item both in the page but also in the right pane depending on your type of search.
The BingItOn page strips the side panes from Bing and Google and also the ads that are above the search results.
But if you search on BingItOn for a celeb for instance you might still see media results (images/video) in the Bing as well as the Google results as those are often shown mixed within the results pane.


Also, how well can you tell which search engine you prefer based solely on the results page itself without investigating how well the content of the individual URLs suggested in the results page satisfy your query on a one by one basis?


A clever and helpful experiment! I've done the bingiton test a few times, and generally come out preferring Google, although not as much now as earlier.

Microsoft Bing's nearly matching Google results in preference is actually quite an achievement, if they just stuck with that.

In the real world, web advertisers pay on clicks and conversion, not based on Microsoft statements. No real economic effect from Microsoft overstating results.

Joyce Hanson

Hi Freakonomics Peeps,
Great website. However, don't you mean to say "we think that Microsoft has a colorable deceptive advertising claim" rather than "we think that Google has a colorable deceptive advertising claim"?
Best, Joyce H

Mike M.

No. They are saying Google may have a deceptive advertising claim against Microsoft.

Julien Couvreur

You raise a good question about the consistency of challenge with the smaller study.
That said, the small study ("study of just 1,000 participants") doesn't seem inappropriately small. That is the common size for many polls, to achieve reasonable error margin.
In some ways, you'd expect the smaller study to have better data quality than a test on the web where some people might take the test more than once.

Damien Castagnozzi

I love Bing for the photos they have around the world. But if I want to search for something, it's Google. :-)

Michael R

Brilliant. Microsoft got thousands of people to tell them what search results they prefer. They did this with existing infrastructure and without having to pay focus group participants or facilitators.they now have a very reliable data set to use in improving their search engine rankings.

Good to whoever thought this up.


I tried the bingiton challenge and chose Google 6/6 times. Bing just sucks and my brain knows it.


Google won when I did the Bing It On challenge. I like how it asks me if I want a rematch. Why do I need a rematch? I didn't lose. LOL

Christopher Landry

This use of fuzzy data to make ad claims is so prevalent in business these days. Disgusting to me... smh


Personally, I find a significant difference between google and bing when searching for technical information. Google's results are much more relevant in that specific area. I think this is possibly the reason google retains such a large advantage in market share since many people rely on the advice of their more technically inclined friends and those folks probably find google superior. For current events and such I don't see a great difference between the two engines.

Matt Wallaert

(Note: I work at Bing, so take all this from that viewpoint.)

A couple of notes are important before I talk about the claims made. There are two separate claims that have been used with the Bing It On challenge. The first is "People chose Bing web search results over Google nearly 2:1 in blind comparison tests". We blogged about the method and it was used back when we kicked off the Bing It On campaign in September 2012. In 2013, we updated the claim to "People prefer Bing over Google for the web's top searches". Now, on to your issues and my explanations.

First, you're “annoyed” by the sample size, saying that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, you then link to a paper you put together with some students, in which you also use a sample size of 1,000 people. You then subdivide the sample into thirds for with different conditions and still manage to meet conventional statistical tests using this sample.

So I'm confused: if you got significance, is it hard to understand that we might? A sample of 1,000 people doing the same task has more statistical power than a sample of 300 people doing the same task. Which is why statistics are so important; they help us understand whether the data we see is an aberration or a representation. A 1,000 person, truly representative sample is actually fairly large. As a comparison, the Gallup poll on presidential approval is around 1,500 people.

Next, you are bothered that we don't release the data from the Bing It On site on how many times people choose Bing over Google. The answer here is pretty simple: we don't release it because we don't track it. Microsoft takes a pretty strong stance on privacy and unlike in an experiment, where people give informed consent to having their results tracked and used, people who come to are not agreeing to participate in research; they're coming for a fun challenge. It isn't conducted in a controlled environment, people are free to try and game it one way or another, and it has Bing branding all over it.

So we simply don't track their results, because the tracking itself would be incredibly unethical. And we aren't basing the claim on the results of a wildly uncontrolled website, because that would also be incredibly unethical (and entirely unscientific). And on a personal side note: I'm assuming that the people in your study were fully debriefed and their participation solicited under informed consent, as they were in our commissioned research.

Your final issue is the fact that the Bing It On site suggests queries you can use to take the challenge. You contend that these queries inappropriately bias visitors towards queries that are likely to result in Bing favorability.

First, I think it is important to note: I have no idea if you're right. Because as noted in the previous answer, we don't track the results from the Bing It On challenge. So I have no idea how using suggested queries versus self-generated queries affects the outcome, despite your suggestion that we knowingly manipulated the query suggestions, which seems to be pure supposition.

Here is what I can tell you. We have the suggested queries because a blank search box, when you're not actually trying to use it to find something, can be quite hard to fill. If you've ever watched anyone do the Bing It On challenge at a Seahawks game, there is a noted pause as people try to figure out what to search for. So we give them suggestions, which we source from topics that are trending now, on the assumption that trending topics are things that people are likely to have heard of and be able to evaluate results about.

Which means that if you're right and those topics are in fact biasing the results, it may be because we provide better results for current news topics than Google does. This is supported somewhat by the study done to arrive at the second claim; "the web's top queries" were pulled from Google's 2012 Zeitgeist report, which reflects a great deal of timely news that occurred throughout that year.

To make it clear, in the actual controlled studies used to determine what claims we made, we used two different approaches to suggesting queries. For the first claim (nearly 2:1), participants self-generated their own queries with no suggestions from us. In the second claim (web's top queries), we suggested five queries of which they could select one. These five queries were randomly drawn from a list of roughly 500 from the Google 2012 Zeitgeist, and participants could easily get additional sets of five if they didn’t want to use the queries in the first set.

There is just one more clarifying point worth making: you noted that only 18% of the world’s searches go through Bing. This is actually untrue; because Bing powers search for Facebook, Siri, Yahoo, and other partners, almost 30% of the world’s searches go through Bing. And that number is higher now than it was a year ago. So despite your assertions, I'm happy to stand by Bing It On, both the site and the sentiment.

(For those who have them, I'm always open to questions about the studies we conducted - feel free to shoot me a note or leave a comment. This invitation is also open to Ian and his students.)




Your comment shows that you either did not thoroughly read this article, or did not understand it.

"First, you’re “annoyed” by the sample size, saying that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, you then link to a paper you put together with some students, in which you also use a sample size of 1,000 people. "

Ian clearly outlines that his team specifically took the SAME sample size that bing used, to see if he would get the same results. If you are going to try to validate any claim, you must use the same sample size. Ian did not say his test was more credible, only that he was looking for the same sample size to see if there was any statistical difference (which, as it turns out, there was).

The point Ian's team was proving here was NOT that the 1,000 person sample size was too small. They were pointing out that using the SAME sample size, they got statistically different results. By Bing's claims, Ian's test SHOULD have seem 500 people choose Bing, 250 choose Google, and 250 call it a tie (or some numbers close to that). What the team found was almost the complete opposite, which shows that Bing obviously did something fishy with their test to get a "2:1" ratio. This is why his team then interjected random search query suggestions into the test - to see what kind of suggestions bing must have given to get their 2:1 number. Your complaint here shows you missed the whole point of the study.

As far as your claim that Bing doesn't track the results of this test, I have no proof either way, but I HIGHLY, highly doubt that Bing would put up a face-off website without SOME kind of page tracking. There is no privacy being invaded by simply reporting the percentages of wins. The only way this would invade privacy would be if they reported who voted for which answer. This seems like an attempt to stir up the pot about the recent privacy issues. I almost rolled my eyes as I read this part of your comment because it was so silly.

I will say, I have hated how Bing frames this entire study. Go do a search on Google, then compare it with the results the bingiton challenge shows for Google. Bing cuts out nearly half of the information gives you (like image results, shopping results, and the knowledge graph), while at the same time displaying that same kind of visual information on it's bing side. This is not a true A vs B test. This is the equivalent of testing a fully upgraded Porshe against a corvette that has had everything taken out except the motor. It's not a true test, which led to this counter-study.

Thanks for the lengthy response, but this was a Bing defense piece from you, not a true examination of whether or not your company gave a fair test.


Matt Wallaert

Actually, you can be sure I read both the post and attached paper closely. Note in my article what I take issue with is not his using 1,000 people, but then giving three different treatments within that sample, reducing effective size for each treatment to 333. Thus, he didn't do what you suggested: try to compare an apples-to-apples comparison to see if he got the same results. Hence my rebuttal.

As for claiming that we do track: we don't. Don't know what to tell you. It would be illegal for me to say that we didn't and then to actually do it. You could sue to find that out in court, I suppose, but you can bet the lawyers wouldn't have let me say it if it wasn't true.

Also, on a personal note: you've just called into doubt my personal credibility. You have, in fact, called me a liar. I don't know about you, but I tend to take these sorts of things fairly seriously. I'm an easy guy to find online; feel free to send me a non-anonymous note and we can discuss, otherwise I'll be forced to conclude that you're kind of a jerk.

Lastly, why did we strip out sidebar? Because both sides have things the other doesn't. We've got social sidebar, snapshot, all sorts of different features, and ditto for Google. Part of good science is sometimes limiting the conditions so that you can study a fairly specific thing: in this case, purely web results.

I appreciate that you feel as though you know my motives. Hopefully, you can one day come up to me at a conference and ask about them in person.


Adriel Michaud