# Everything Is Correlated

How do people who love salty snacks like their toilet paper to hang?  Are fans of carbonated beverage more likely to enjoy horror movies?  A new website, www.correlated.org, has the answer to such pressing questions.  Founded by Shaun Gallagher, the brains behind last year’s UnofficialCensus.org, it aims to uncover one surprising correlation a day. The site asks visitors a new question every day and “[a]t the end of the day, the results of the survey are compared with the results of all previous surveys, and the two outcomes with the strongest link are highlighted.” Please remember: correlation does not equal causation. Still, we predict this site will launch a thousand graduate theses.

#### Mark

One word. Response bias.

#### Zachary

Correlation: People who say 'one word' just can't help themselves (see: Mark)

#### Ian

Wow, talk about the speed of light. This blurb is already being used as a plug on the site!

#### Jennie

That is a very small sample size to be using the phrase "in general..."

#### Alan Zarky

These are not correlations. These are associations. Correlation involves variables with multiple values across multiple datapoints, not nominal variables such as these. . If you measured how much people like carbonated beverages and how much they like horror movies, then if the more they liked one, the more (or less) they liked the other, you'd have correlation. But with yes/no or similar nominal variables, my understanding is that the term "correlation" does not apply.

#### Shaun G

Hi Alan,

I'm the creator of Correlated.

Strictly speaking, you're right. Stats textbooks define correlation the way you describe.

Here's a great primer on the definition of correlation by CoolData's Kevin MacDonell:

http://cooldata.wordpress.com/2011/03/14/correlation-and-you

I reached out to MacDonell and asked him to give me his impression of Correlated, in light of the textbook definition:

http://coding.pressbin.com/90/CoolDatas-MacDonell-gives-Correlated-the-OK

(By the way, you'll notice that at the bottom of the post, I mentioned I'd try to get the Freakonomics guys on board, too. And lo and behold ...)

#### Alan Zarky

Sorry, I didn't mean to sound too pedantic (but just pedantic enough). Just as long as we're having fun with non-rigorous data, I just wanted to have fun by being overly serious. And I agree that "associated.org" just doesn't have the same ring to it.

#### David

I had this idea in 2006. I called it "The Correlation Project" and wrote about it at http://www.ironicsans.com/2006/05/idea_the_correlation_project.html

It was mentioned also last year in this Slate article: http://www.slate.com/id/2277517/

Now, I'm not implying a correlation here, but I do wonder if there's a causation.

So, obviously, I love this idea.

#### Nosybear

I just wish they would publish a confidence interval.... Apart from the fact that the respondents self-select, introducing selection bias in the responses. Also, sample size doesn't indicate a whole lot of power. In short, I'm only slightly less skeptical of these results than, say, political polling or a Faux News Instapoll....

#### j.quasimodo

Two issues: (1) The sample is self-selected (so the correct statement might be, "Among people who volunteer such answers and understand HTML, the preferred toilet paper orientation is..." (2) Of course the sample size is much too small; that problem may correct itself if the site goes viral.

#### Eric M. Jones

I like the fact that high cancer rates correlate with high milk-drinking rates. (Because people who drink milk live longer lives and cancer is much more common in old people.)

But I think you would be doing the world a great service to craft some catchy phrase that would explain this logical golden rule to the "information challenged electorate". "Correlation is not causation" is a great phrase for the college educated, but howsabout something for the Bush-Palin voters?

#### David

Went to correlated.org yesterday and today. The latest polls have hundreds of more people answering than the previous one. Freakonomics might have just tripled their traffic.

#### Greg

Obviously everyone has picked up on response bias and sample sizes, but I think the biggest issue is that with so many polls to possibly inter-correlate (or associate if you will), we should expect some strong-looking correlations to show up randomly.

Today's xkcd illustrates the problem perfectly: http://xkcd.com/882/