Google’s New Correlation Mining Tool: It Works!

You may have heard of Google Trends. It’s a cool tool which will show you the ups-and-downs of the public’s interest in a particular topic—at least as revealed in how often we search for it. And you may have even heard of the first really important use of this tool: Google Flu Trends, which uses search data to try to predict flu activity. Now Google has released an amazing way to reverse engineer the process: Google Correlate. Just feed in your favorite weekly time series (or cross-state comparisons), and it will tell you which search terms are most closely correlated with your data.

So I tried it out.  And it works! Amazingly well.

I fed in the weekly numbers on initial unemployment claims—one of the most important weekly economic time series we have.  The search term that is most closely correlated? Crikey, it’s “filing for unemployment.”  Indeed, the correlation is an astounding 0.91.

 

Given the latest Google Trends on “filing for unemployment,” I’ll forecast that initial unemployment claims will tick down in the next couple of weeks.

With an eye to earning a quick trading fortune, I also uploaded data on weekly returns on the S&P 500. But Google failed to find anything significantly correlated. Score one for the random walk hypothesis.

Interested in more?  Here’s Google correlate; here’s a comic introduction; and here’s their white paper. Also, here’s Hyunyoung Choi and Hal Varian’s research on “Predicting the Present with Google Trends,” which shows that retail, auto, home and travel sales are also now-castable with search data (blog summary here); they’ve also previously shown the value of Google Trends in predicting initial unemployment claims.  And here’s Albert Saiz and Uri Simonsohn on “Downloading Wisdom from Online Crowds.”

(Hat tip: Bo Cowgill)

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 13

View All Comments »
  1. James says:

    As a converse point, and a great example of correlation does not equal causation, there’s a .83 correlation between electricity prices in Ireland and searches for ‘stanford webmail’. A .82 correlation between the prices and ‘copper theft’ (which might have some weak link due to the commodities boom and bust of 2008-9).

    More generally, it seems those correlations are calculated from absolute levels, noot the returns (% change daily), which is known to give spuriously high correlations.

    Thumb up 0 Thumb down 0
  2. Anoush says:

    We are in a hugely fascinating moment with regard to real-time data indeed! I work at UN Global Pulse, an innovations initiative in the UN, which is looking precisely at this potential: are there signals in new data which can serve as early indicators of stress in a society/community?

    http://www.unglobalpulse.org/blog/digital-smoke-signals

    In a world of increasing global crises and shocks, we need better real-time data to understand when populations are vulnerable – and be able to respond with more agility. Information seeking/online search behavior is a very important type of new data that we are exploring at Global Pulse. We are delving into research projects/experiments to explore what indicators could be most telling. Our approach is collaborative and we welcome any comments/insights (Twitter: @UNGlobalPulse) as we ideate!

    Thumb up 0 Thumb down 0
  3. Renato P. dos Santos says:

    I thought you would like to know that I cited this blog post in the Web Search Database I am building from Google Correlate: http://www.searchcorrelations.com/initial-unemployment-claims.html

    You may find interesting how the most most highly correlated search term changed since them but how your forecast “that initial unemployment claims will tick down” in the next couple of weeks after the end of May, 2011 seems confirmed from an updated graph.

    Thumb up 0 Thumb down 0