Is the "Bing It On" Challenge Lightweight?

Microsoft has now responded, with a blog post and a letter, to my post about an experimental study that I coauthored with Yale Law School students Emad Atiq, Sheng Li, Michelle Lu, Christine Tsang, and Tom Maher.  Our paper calls into question the validity of claims that people prefer Bing nearly two to one.

In response to several commenters: I do not work for and do not have any consulting relationship with Google.

Microsoft claims that our study is flawed because it relied on their own blind comparison website.  They now say that “Bing It On” is meant to be a “lightweight way to challenge people’s assumptions about which search engine actually provides the best results.”  To be sure, companies often use fantastical or humorous scenarios for free advertising. However, Microsoft’s television commercials present the site as a credible way that people can learn whether they prefer Google or Bing.  These commercials show people who discover that they really prefer Bing to Google.  The challenge site that they created is either sufficient to provide insights into consumer preferences or it isn’t.  The advertisements give the impression that the challenge site is a useful tool.  Microsoft can’t have it both ways.  If it is a sufficient tool to “challenge people’s assumptions,” then it is sufficient to provide some evidence about whether the assumed preference for Google is accurate. 

What’s more, the ads have conveyed the clear impression that a substantial majority of challenge takers prefer Bing.  The “nearly 2-to-1” claim, combined with videotaped examples in their TV commercials where virtually 100 percent of the challenge-takers learn that they prefer Bing, suggests that the experience of Bing It On users confirms the results of Microsoft’s 1,000-person study. 

After spending years and (presumably) millions of dollars trying to convince consumers that the Bing It On challenge de-biases the “Google Habit,” it seems incongruous for Microsoft to turn around and claim the whole exercise was “lightweight” and did not warrant tracking and analysis.

This bring us to the second point, which concerns Microsoft’s (literally) bold declaration that “we don’t track the results from the Bing It On challenge” because doing so would be an unethical invasion of privacy.

This dog won’t hunt.  

First off, there must be some tracking for Microsoft to count and then advertise that 5 million people have taken the challenge (the number is over 25 million as of May 2013). Second, Microsoft still has not explained how it came up with its list of suggested search items.  Our study suggests that the list had been systematically chosen to favor terms that are more likely to produce a Bing preference.  How exactly did Microsoft learn that these terms were Bing-friendly if it hadn’t been tracking?  We’re still waiting for Microsoft to explain this anomaly.

More important, tracking search results is an essential part of Bing’s business model. All search engine companies operate by analyzing search data to improve user experiences. The Bing It On website is slightly different from a search engine in that it asks users which set of search results they prefer.  However, it is unclear why anonymous and aggregated side-by-side search preferences trigger greater privacy concerns than information (also aggregated and anonymous) on what terms users search for, which results they click on, and the myriad pieces of user data that feeds Bing’s algorithm. 

One way to think of our study is that we simply started to track the results of the challenge that Microsoft has been choosing not to.  Our subjects went to Microsoft’s own site and told us what happened.  Now Microsoft has good reason to believe that most people who take the challenge do not prefer Bing.  Given this knowledge, it would be misleading to place a banner asking consumers to “join the 5 million people who’ve visited the challenge” next to advertised claims on Bing’s superiority over Google in “blind comparison tests,” when the “challenge” and the “blind comparison tests” are in fact different creatures.  We take it as progress that Microsoft has become more circumspect in its claims.  The Bing It On website, for example, no longer includes the “5 million visitors” figure (as it previously did).

What we are concerned about is companies playing fast and loose with numbers to create misleading advertisements.  Advertisers are typically (and rightly) given a lot of flexibility to be lightweight with the truth, but when statistics and studies are introduced in a scientific manner to support a claim, that flexibility should end. Microsoft should ensure that the results of the millions of Bing It On challenge-takers are not inconsistent with its advertised claims.

Ben Silver

The biggest problem I had when I took the test (and still got Google every time) is that Microsoft strips everything that isn't a text result.
Sure, if I saw the actual window, I would know that it was Google or Bing, but the link results aren't the only thing I get in a google search.
They basically stripped away a bunch of the cool features that google has (images, maps, wiki summaries, translations, etc) and said that people preferred Bing.
I would liken it to taking out the sugar and chocolate from someone else's recipe for Brownies and saying that people think your tastes better.


Tastes differ, though. For me, the stripping out of those "cool features" - AKA "garbage" or "noise" - is one of the nicer things about using Bing.


They are not removing them from Bing though, which still has plenty of "garbage". They just remove it from the comparison.


If you watch the web traffic on the bingiton website, they are passing data back to their servers every time you make a selection. So they do know exactly how many times one option was chosen.

That data doesn't appear to include the search query, but it includes information about the event happening. Can't determine immediately if it includes the results (aka if the person choose bing or google) but it would be trivial for them to track which option was choosen anyways.


It's easy to game the BingItOn website because it's easy to tell which results are which. I just searched "Cisco Systems", and the results on one side had a subsection for "cisco systems near Redmond, WA". Did Bing's original "blind" study blind the participants to obvious clues like this? (I don't know.) On the other hand, Ayers's point that the comparison website is flawed is a good one.


Replying to my own comment... I assumed that the result for "near Redmond" was Bing. Turns out it was Google. Oops.


They are marketers. They lie for a living. Shocking!

Thom Denick

Marketers lying isn't shocking, but Microsoft constantly invokes the word "Science" in their blog posts. Specifically, "Good Science" (Microsoft) vs. "Bad Science" (Freakonomics). While they certainly have a point about the Mechanical Turk's selection sample, I don't think they are in any better position considering how little they've provided themselves about the research firm's selection sample. Both studies, I would argue are "Bad Science" which is to say prove nothing.

Microsoft also poisons this comparison by blowing out their image galleries for almost any search. This is obviously going to push respondents in a direction they want. Even with the push, they still appear to be losing...

...The fact that they're obviously collecting this data, and are hiding it seems to indicate that MS is well-aware of Bing's shortcomings (and their need to bribe users by paying them per search to use it.)



Search engine companies run tests where they have users compare results, like the Bing Challenge. These focus groups are expensive and you don't get a lot of data. However, if you can come up with a way to get thousands of people to compare your results to your competition for free you have achieved two things. First, you have a collection of queries where you are judged worse than your competition so you can focus your efforts. Second, you got a few of them to switch.

Microsoft doesn't need to win the Bing Challenge today. Their only hope of eventually winning it is dependent on the data they collect and their ability to overcome the weaknesses identified by the free focus group. Having a misleading ad is also a nice bonus but it doesn't pay off in the long term like the data.

Thom Denick

That's a really good point. They can use the data (which they insist they are not collecting, pffft), to find out exactly what types of terms they are losing in. I've certainly seen their searches evolve significantly in the past two years.

I use them because they pay me to via their Bing Rewards program. I end up getting about $5/month in Xbox rewards with normal search use. Of course, when I need a power search, I use Google, but for normal web-navigation (when was the last time you typed in an entire URL?), I use Bing.


I'm not sure if anyone else has mentioned this, but I'm sure it should be a bigger issue if they have. But hasn't anyone noticed that the two search results have a slightly different colour palette to each other?

I've run through the test several times now, picking solely based on which page appears the clearest and most vibrant side is almost always Bing. Initially I did a quick test with a colour picker to establish there was a difference, then ran through the test several times, always picking the page that felt like the text was brighter and more vivid. Only in very few instances did I get it wrong.

It's really tactics like that which I think are underhanded, an attempt to elicit a positive or negative reaction with an barely perceptible difference which affects the user's "attraction" to an option. If you can't tell the difference, just try running through the test a few times without looking at the results, just pick the one that feels nicer. The proportion of images can sometimes mess around with it, but most of the time you can tell which one you are supposed to like.

I'd love to hear back if anyone else noticed this and what they think.