Challenging the Bing-It-On Challenge

Did you find this blog post through Bing?  Probably not — 67% of worldwide searches go through Google, 18% through Bing.  But Microsoft has advertised in a substantial TV campaign that — in the cyber analog to blind taste-testing — people prefer Bing “nearly 2:1.”  A year ago, when I first saw these ads,  the 2-1 claim seemed implausible.  I would have thought the search results of these competitors would be largely identical, and that it would be hard for people to distinguish between the two sets of results, much less prefer one kind 2:1.

When I looked into the claim a bit more, I was slightly annoyed to learn that the “nearly 2:1” claim is based on a study of just 1,000 participants.  To be sure, I’ve often published studies with similarly small data sets, but it’s a little cheeky for Microsoft to base what might be a multi-million dollar advertising campaign on what I’m guessing is a low-six-figure study. 

To make matters worse, Microsoft has refused to release the results of its comparison website, BingItOn.com.  More than 5 million people have taken the Bing-It-On challenge – which is the cyber analog to a blind taste test.  You enter in a search term and the Bing-It-On site return two panels with de-identified Bing and Google results (randomly placed on the right or left side of the screen).  You tell the site which side’s results you prefer and after five searches the site reveals whether you prefer Bing or Google (see below).

Microsoft’s ads encourage users to join the millions of people who have taken the challenge, but it will not reveal whether the results of the millions are consistent with the results of the 1,000.

 bing

So together with four Yale Law students, I set up a similar-sized experiment using Microsoft’s own BingItOn.com site to see which search engine users prefer.  We found that, to the contrary of Microsoft’s claim, 53 percent of subjects preferred Google and 41 percent Bing (6 percent of results were “ties”).  This is not even close to the advertised claim that people prefer Bing “nearly two-to-one.”  It is misleading to have advertisements that say people prefer Bing 2:1 and also say join the millions of people who’ve taken the Bing-It-On challenge, if, as in our study, the millions of people haven’t preferred Bing at a nearly a 2:1 rate.  Microsoft might have realized this and has more recently altered its advertising to back off their original claim to just say that people “prefer” Bing.

We also interjected a bit of randomness into our study to test whether the type of search term impacts the likelihood that Bing is preferred.  We randomly assigned participants to search for one of three kinds of keywords: Bing’s suggested search terms, popular search terms, and self-suggested search terms.  When Bing-suggested search terms were used the two engines statistically tied (47% preferring Bing vs. 48% preferring Google).  But when the subjects in the study suggested their own searches or used the web’s most popular searches, a sizable gap appeared: 55-57% preferred Google while only 35-39% preferred Bing.  These secondary tests indicate that Microsoft selected suggested search words that it knew were more likely to produce Bing-preferring results. You can read our full paper here.

The upshot: several of Microsoft’s claims are a little fishy.  Or to put the conclusion more formally, we think that Google has a colorable deceptive advertising claim against Microsoft.  It could be worth a lot of money on lost ad revenue if the claims misled people into thinking that a substantial majority of people prefer Bing over Google.  Then again, I might be a little over zealous in seeing Lanham Act violations.  Back in 2010, I suggested that the movie Date Night might contain a deceptive Kindle ad.

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 68

View All Comments »
  1. Christopher Landry says:

    This use of fuzzy data to make ad claims is so prevalent in business these days. Disgusting to me… smh

    Thumb up 0 Thumb down 0
  2. RogC says:

    Personally, I find a significant difference between google and bing when searching for technical information. Google’s results are much more relevant in that specific area. I think this is possibly the reason google retains such a large advantage in market share since many people rely on the advice of their more technically inclined friends and those folks probably find google superior. For current events and such I don’t see a great difference between the two engines.

    Thumb up 1 Thumb down 0
  3. Matt Wallaert says:

    (Note: I work at Bing, so take all this from that viewpoint.)

    A couple of notes are important before I talk about the claims made. There are two separate claims that have been used with the Bing It On challenge. The first is “People chose Bing web search results over Google nearly 2:1 in blind comparison tests”. We blogged about the method and it was used back when we kicked off the Bing It On campaign in September 2012. In 2013, we updated the claim to “People prefer Bing over Google for the web’s top searches”. Now, on to your issues and my explanations.

    First, you’re “annoyed” by the sample size, saying that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, you then link to a paper you put together with some students, in which you also use a sample size of 1,000 people. You then subdivide the sample into thirds for with different conditions and still manage to meet conventional statistical tests using this sample.

    So I’m confused: if you got significance, is it hard to understand that we might? A sample of 1,000 people doing the same task has more statistical power than a sample of 300 people doing the same task. Which is why statistics are so important; they help us understand whether the data we see is an aberration or a representation. A 1,000 person, truly representative sample is actually fairly large. As a comparison, the Gallup poll on presidential approval is around 1,500 people.

    Next, you are bothered that we don’t release the data from the Bing It On site on how many times people choose Bing over Google. The answer here is pretty simple: we don’t release it because we don’t track it. Microsoft takes a pretty strong stance on privacy and unlike in an experiment, where people give informed consent to having their results tracked and used, people who come to BingItOn.com are not agreeing to participate in research; they’re coming for a fun challenge. It isn’t conducted in a controlled environment, people are free to try and game it one way or another, and it has Bing branding all over it.

    So we simply don’t track their results, because the tracking itself would be incredibly unethical. And we aren’t basing the claim on the results of a wildly uncontrolled website, because that would also be incredibly unethical (and entirely unscientific). And on a personal side note: I’m assuming that the people in your study were fully debriefed and their participation solicited under informed consent, as they were in our commissioned research.

    Your final issue is the fact that the Bing It On site suggests queries you can use to take the challenge. You contend that these queries inappropriately bias visitors towards queries that are likely to result in Bing favorability.

    First, I think it is important to note: I have no idea if you’re right. Because as noted in the previous answer, we don’t track the results from the Bing It On challenge. So I have no idea how using suggested queries versus self-generated queries affects the outcome, despite your suggestion that we knowingly manipulated the query suggestions, which seems to be pure supposition.

    Here is what I can tell you. We have the suggested queries because a blank search box, when you’re not actually trying to use it to find something, can be quite hard to fill. If you’ve ever watched anyone do the Bing It On challenge at a Seahawks game, there is a noted pause as people try to figure out what to search for. So we give them suggestions, which we source from topics that are trending now, on the assumption that trending topics are things that people are likely to have heard of and be able to evaluate results about.

    Which means that if you’re right and those topics are in fact biasing the results, it may be because we provide better results for current news topics than Google does. This is supported somewhat by the study done to arrive at the second claim; “the web’s top queries” were pulled from Google’s 2012 Zeitgeist report, which reflects a great deal of timely news that occurred throughout that year.

    To make it clear, in the actual controlled studies used to determine what claims we made, we used two different approaches to suggesting queries. For the first claim (nearly 2:1), participants self-generated their own queries with no suggestions from us. In the second claim (web’s top queries), we suggested five queries of which they could select one. These five queries were randomly drawn from a list of roughly 500 from the Google 2012 Zeitgeist, and participants could easily get additional sets of five if they didn’t want to use the queries in the first set.

    There is just one more clarifying point worth making: you noted that only 18% of the world’s searches go through Bing. This is actually untrue; because Bing powers search for Facebook, Siri, Yahoo, and other partners, almost 30% of the world’s searches go through Bing. And that number is higher now than it was a year ago. So despite your assertions, I’m happy to stand by Bing It On, both the site and the sentiment.

    (For those who have them, I’m always open to questions about the studies we conducted – feel free to shoot me a note or leave a comment. This invitation is also open to Ian and his students.)

    Well-loved. Like or Dislike: Thumb up 41 Thumb down 7
    • Chris says:

      Matt,

      Your comment shows that you either did not thoroughly read this article, or did not understand it.

      “First, you’re “annoyed” by the sample size, saying that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, you then link to a paper you put together with some students, in which you also use a sample size of 1,000 people. ”

      Ian clearly outlines that his team specifically took the SAME sample size that bing used, to see if he would get the same results. If you are going to try to validate any claim, you must use the same sample size. Ian did not say his test was more credible, only that he was looking for the same sample size to see if there was any statistical difference (which, as it turns out, there was).

      The point Ian’s team was proving here was NOT that the 1,000 person sample size was too small. They were pointing out that using the SAME sample size, they got statistically different results. By Bing’s claims, Ian’s test SHOULD have seem 500 people choose Bing, 250 choose Google, and 250 call it a tie (or some numbers close to that). What the team found was almost the complete opposite, which shows that Bing obviously did something fishy with their test to get a “2:1″ ratio. This is why his team then interjected random search query suggestions into the test – to see what kind of suggestions bing must have given to get their 2:1 number. Your complaint here shows you missed the whole point of the study.

      As far as your claim that Bing doesn’t track the results of this test, I have no proof either way, but I HIGHLY, highly doubt that Bing would put up a face-off website without SOME kind of page tracking. There is no privacy being invaded by simply reporting the percentages of wins. The only way this would invade privacy would be if they reported who voted for which answer. This seems like an attempt to stir up the pot about the recent privacy issues. I almost rolled my eyes as I read this part of your comment because it was so silly.

      I will say, I have hated how Bing frames this entire study. Go do a search on Google, then compare it with the results the bingiton challenge shows for Google. Bing cuts out nearly half of the information gives you (like image results, shopping results, and the knowledge graph), while at the same time displaying that same kind of visual information on it’s bing side. This is not a true A vs B test. This is the equivalent of testing a fully upgraded Porshe against a corvette that has had everything taken out except the motor. It’s not a true test, which led to this counter-study.

      Thanks for the lengthy response, but this was a Bing defense piece from you, not a true examination of whether or not your company gave a fair test.

      Hot debate. What do you think? Thumb up 17 Thumb down 16
      • Matt Wallaert says:

        Actually, you can be sure I read both the post and attached paper closely. Note in my article what I take issue with is not his using 1,000 people, but then giving three different treatments within that sample, reducing effective size for each treatment to 333. Thus, he didn’t do what you suggested: try to compare an apples-to-apples comparison to see if he got the same results. Hence my rebuttal.

        As for claiming that we do track: we don’t. Don’t know what to tell you. It would be illegal for me to say that we didn’t and then to actually do it. You could sue to find that out in court, I suppose, but you can bet the lawyers wouldn’t have let me say it if it wasn’t true.

        Also, on a personal note: you’ve just called into doubt my personal credibility. You have, in fact, called me a liar. I don’t know about you, but I tend to take these sorts of things fairly seriously. I’m an easy guy to find online; feel free to send me a non-anonymous note and we can discuss, otherwise I’ll be forced to conclude that you’re kind of a jerk.

        Lastly, why did we strip out sidebar? Because both sides have things the other doesn’t. We’ve got social sidebar, snapshot, all sorts of different features, and ditto for Google. Part of good science is sometimes limiting the conditions so that you can study a fairly specific thing: in this case, purely web results.

        I appreciate that you feel as though you know my motives. Hopefully, you can one day come up to me at a conference and ask about them in person.

        Thumb up 5 Thumb down 4
      • David says:

        I’m assuming that you’re tracking initial page views in order to know how many people have taken the test (you only specify in your post that you don’t ‘track results’ rather than guaranteeing that there’s no analytics being used whatsoever!) Surely collecting anonymous data about what percentage of people voted for which search engine would not be a greater invasion of privacy?

        I appreciate your point about data collected in an uncontrolled environment being less scientific and if a bunch of Microsoft haters wanted to skew the results then that would not be Bing’s fault. I can also imagine a marketing department being cautious of tracking the results because if it did show that Google was significantly better then it would leave them with egg on their face.

        Internally tracking which search terms Google outperforms Bing on could really help Microsoft improve Bing. Whilst I am not accusing you of lying, I am surprised that Microsoft decided to not do any tracking of results and am left wondering why Microsoft didn’t use such an opportunity to improve their product.

        Thumb up 1 Thumb down 0
  4. Adriel Michaud says:

    Ouch.

    Thumb up 0 Thumb down 4
  5. Karl says:

    This assumes that Bing and Google’s search quality have remained relatively constant since the campaign launched. Competition between Bing and Google have pushed both to feverishly focus on improving their search quality.

    Of course, this is not always a great thing. Since the campaign launched, Google’s algorithms seem to have been tweaked to be more like Bing — generating better results for common queries at the expense of quality in the “long tail” of uncommon queries.

    Thumb up 2 Thumb down 0
  6. Clive Portman says:

    Do you think to some extent the reason people like Google’s responses to their self-suggested search terms more is because they’ve been trained to use Google with their everyday use of it?

    I’ve been trying Bing recently, but haven’t been finding the results as good as Google. Yet, I’ve been using Google for years and know exactly how to phrase my Google queries to get the information I’m after.

    That learned behaviour is going to be a problem for Bing isn’t it? On the surface Bing isn’t producing the best results for me, but then I’m searching with Bing using skills appropriate to Google. Why should I have to re-learn the way I search?

    Well-loved. Like or Dislike: Thumb up 7 Thumb down 1
    • Paul T. Lambert says:

      Relearning how do do things is necessary in our fast-changing tech world; otherwise you’ll be stuck in the past and merely getting by instead of maximizing what you can do. I always experiment to see if I can do things better and faster with different services.

      Unfortunately, websearch is turning into an adversarial process, especially with so many advertisers doing SEO and companies using various secret preferences to include and rank results, perhaps with some “input” from the NSA and other government or private enterprises. There’s no such thing as an objective websearch anymore; you have to study results and infer whatever heuristics and random behaviors are in place on each particular day. I suppose any search engine will quickly tell you what Justin Bieber had for lunch or where Honey Boo-Boo just took a dump, but I usually have to search for very specific and atypical items of information. I guess I am not the typical user targeted by Bingoogle, so none of this really matters all that much in the grand scheme of things.

      Thumb up 1 Thumb down 2
  7. Beslin says:

    I need to summarize my thought with a small example, i was working on a website SEO which was in first page first position in Bing about a year but i hardly received 5 inquires. Once my website start listed in Google first page (not in first position) within a month i got more than 50 inquiries.
    After that i guessed Google is getting around 90% of search queries and others getting 10%(might be wrong but that is what i believe)

    Well-loved. Like or Dislike: Thumb up 5 Thumb down 0
  8. SEO says:

    as an SEO i’d have to say i think google’s search results 90% of the time are much better than bings or ddg’s. i hate the way google have lumped everything into (not provided) now though, they’re a pretty deplorable bunch.

    Thumb up 0 Thumb down 0