Google’s New Correlation Mining Tool: It Works!

You may have heard of Google Trends. It’s a cool tool which will show you the ups-and-downs of the public’s interest in a particular topic—at least as revealed in how often we search for it. And you may have even heard of the first really important use of this tool: Google Flu Trends, which uses search data to try to predict flu activity. Now Google has released an amazing way to reverse engineer the process: Google Correlate. Just feed in your favorite weekly time series (or cross-state comparisons), and it will tell you which search terms are most closely correlated with your data.

So I tried it out.  And it works! Amazingly well.

I fed in the weekly numbers on initial unemployment claims—one of the most important weekly economic time series we have.  The search term that is most closely correlated? Crikey, it’s “filing for unemployment.”  Indeed, the correlation is an astounding 0.91.

 

Given the latest Google Trends on “filing for unemployment,” I’ll forecast that initial unemployment claims will tick down in the next couple of weeks.

With an eye to earning a quick trading fortune, I also uploaded data on weekly returns on the S&P 500. But Google failed to find anything significantly correlated. Score one for the random walk hypothesis.

Interested in more?  Here’s Google correlate; here’s a comic introduction; and here’s their white paper. Also, here’s Hyunyoung Choi and Hal Varian’s research on “Predicting the Present with Google Trends,” which shows that retail, auto, home and travel sales are also now-castable with search data (blog summary here); they’ve also previously shown the value of Google Trends in predicting initial unemployment claims.  And here’s Albert Saiz and Uri Simonsohn on “Downloading Wisdom from Online Crowds.”

(Hat tip: Bo Cowgill)

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 13

View All Comments »
  1. Jareau says:

    Very cool!

    Thumb up 1 Thumb down 0
  2. scott cunningham says:

    I wish it could handle a panel. I think it’s either cross section or time series though.

    Thumb up 1 Thumb down 0
  3. DC says:

    “facebook” and “tapeworm in humans” are 0.8721 correlated.

    Thumb up 2 Thumb down 0
  4. Shane says:

    I’m a huge fan of the Trends services and Correlation looks like lots of fun too. Though I’d like to see them show results for the numbers of actual searches instead of just the proportion of all searches. As it is, it’s hard to spot longer term trends.

    For example the search term “terrorism” seems to have declined strongly since 2004:
    http://www.google.com/insights/search/#q=terrorism&cmpt=q

    However we don’t know if this is because fewer people are searching for terrorism, or if more people are searching for other terms. The arrival of new demographic groups (older people, say, or poorer people) to the internet could distort relative results.

    Anyway Trends is still fascinating. I used it twice to predict the results of elections on which I placed small bets – successfully! But it needs to be used very carefully. Fox News ran a story last year arguing that Pakistani people are unusually likely to search for pornographic terms. Really Trends do not give us enough information to make that claim. I’ve explored the subject here, should anyone be interested:
    http://shaneleavy.blogspot.com/2010/08/just-how-kinky-is-pakistan.html

    Thumb up 0 Thumb down 0
    • Matt says:

      There would be similar problems with using absolute numbers as well though, first one that springs to mind is if Google’s market share changed drastically.

      Thumb up 1 Thumb down 0
      • Shane says:

        Absolutely Matt, that makes sense. I presume, though, that between both kinds of information the user would be better informed than simply using the one.

        Thumb up 0 Thumb down 0
  5. Robbie says:

    Have you checked out what is correlated with Superfreakonomics? What do Windows 7 Clean Install and Jeff Dunham show have to do with Superfreakonomics?

    http://correlate.googlelabs.com/search?e=Superfreakonomics&t=weekly#

    Thumb up 0 Thumb down 0
  6. Kevin says:

    Very nice, Can this be used to handicapp the repubics primary presidential candidate in Vegas?

    Thumb up 0 Thumb down 0
  7. Jason says:

    As far as predicting the stock market, this paper might be interesting to some:

    http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.3003v1.pdf

    They adapted another tool that google created, but doesn’t make available to the public, which is called the GPOMS: the google profile of mood states. Here is the last sentence of the abstract:

    “We ?nd an accuracy of 87.6% in predicting the daily up and
    down changes in the closing values of the DJIA and a reduction
    of the Mean Average Percentage Error by more than 6%.”

    The reduction of the error refers to the inclusion of the GPOMS after some standard variables in a forecast.

    The most important predictor turned out to be a calm mood, as measured by the GPOMS.

    Now if only GPOMS were publicly available, one could do some interesting tests with google trends.

    The paper relies on regression, but also something called a self organizing fuzzy neural network. It takes some reading to get a handle on, but my understanding is that a SOFNN is a merging of artificial intelligence and fuzzy logic. Fuzzy logic is derived from the fuzzy set, which is a set whose elements can be partly inside the set. I guess it’s math’s concept of “sort of.” Apparently, the SOFNN is helpful when inputs are linguistic.

    Thumb up 1 Thumb down 0