Our Daily Bleg: What to Do About Too Much Data?

A reader named Evan Schumacher wrote in with an interesting bleg. (Read about blegs here and send your own here.)

Tucked inside his bleg is the part that tickled me the most: a website Evan created to tell him whether it’s worth it to watch a basketball game he’d recorded. Anyway, I’ll give my answer below, after his bleg.

I was wondering if too much data is ever a bad thing? I ask because I thought one of the rules of life that I’ve learned is that it’s best to have as much data as possible.

Whether it be hard numbers or smart people around, and at least when you are starting, you want as much information as you can get. The smart guys are the ones who know how to analyze it.

However, in my personal life I was having a problem with too much data. I watch all of the Warriors basketball games on DVR. However, nothing is worse than watching for 1.5 hours and in the end your team gets blown out. However, I never want to see the score before I watch because that ruins the game. To solve the problem I created a little website to warn me if the games are bad (www.shouldiwatch.com), but it won’t tell me anything about the outcome (who won or the score) if the game was relatively close. Trust me, as a Warriors fan this is a huge time saver. It’s a stupid little example, but it makes me wonder if there are other cases when you are doing research when you need to turn away from some information.

Anyway, it’s a little bit backwards to think of, but I thought it might be interesting to explore.

His question sounds as if it is directed more at quants than writers, but as a writer I’ll say that I face this dilemma daily. Right now, in the middle of our writing SuperFreakonomics, I’m facing a number of short sections that require a bunch of historical reading and research. But the key thing is that these sections remain short — they are not the donut in this case, but the donut hole, and if they start getting swollen they will turn the book into a flabby monster.

The problem is that the reading and research is so much fun that it is really hard to limit yourself. Especially in this age of Google (and Google Books) and Amazon and even Wikipedia (yes, I was an early detractor but have come around on certain subjects), I am constantly trying to take a little sip from a firehose, and it’s nearly impossible. Reading too much inevitably turns into wanting to write too much; in this case, shorter will be better, but it takes a lot of effort and a long time to get the right three paragraphs (as opposed to a much easier but, to my mind, less effective 12 paragraphs).

The problem is that the more I’ve read — and the more data I’ve consumed, to get back to Evan’s question — the better those three paragraphs will be in the end. It reminds me of making maple syrup, which we did every winter as kids. You’d run around collecting all this sap, gallons and gallons of it from the trees you’d tapped, and then stay up all night boiling it down on an open fire — all to produce one little jar of syrup.

Was it worth the effort? Some people would say yes, others no. But in any case, it sure tasted good.

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 23

View All Comments »
  1. DK1 says:

    I think that website is a great idea, but I wonder if it would work better if fans (who already saw the game) could give their vote (either a thumbs up/down or an Amazon-esque star rating) which would capture more subjective criteria than just the final score.

    Your team may have lost by only 8 points, but maybe there was some meaningless points in garbage time to make the score look closer than it really was. Conversely, maybe the final score wasn’t compelling, but if a star player puts on a show (or if a player attacks someone in the stands) you still might want to watch.

    Thumb up 0 Thumb down 0
  2. charles says:

    Mostly what you are collecting is noise. In Steve’s case his role is different. He’s actually the filter and I thank him for it. He is also working in a somewhat bounded domain. So, in that capacity, data is good, more is better and a little more is nice although probably not worth the effort.. If you are investing, or otherwise an expert in an area where there are no experts, then the extra data will lead you down a fool’s path.

    Indeed the marginal utility of the next bit of data can be negative, leading to health problems (stress) and an increase in confidence (exposing you to large errors). This seems to be the domain of the original post…life in general.

    Like food, some is good, more is better, more than that is bad, more than that is worse, more than that and you’re dead.

    I’ve told a number of folks that I believe the next set of great advances should come in the form of data aggregation and filtering.

    Thumb up 0 Thumb down 0
  3. Nuclear Mom says:

    This is a fascinating problem. In the past (20 years ago) I was not concerned about the government accumulating data on us. Who would have time to pore over all the useless conversations, the useless credit card charges, the useless trips, the endless printouts and factoids?

    But thanks to truly impressive algorithm development and processing speed, faces can be picked out of crowds, words (bomb, terror) can be picked out of conversations, allowing the data pile to be reduced to a more manageable and targeted level.

    Beyond the Big Brother implications, there are two competing goals when reducing or filtering information overload:

    1. Can you define your algorithm clearly to produce the desired output? As a non-sports fan, I was intrigued by the problem posed in this bleg. You want to know whether a game is worth watching, but you don’t want to know the winner or the score. What criteria are you using to decide if a game is worth watching?

    2. How much important information gets discarded by the above algorithm? Stephen addresses this problem — he is more informed and a better writer (and maybe person) for having consumed reams of information, even though it may not be reflected in his 3 distilled paragraphs. Would you inadvertently throw out a game that was a blowout loss but had some striking features — a record set by a member of the opposite team that it would have been neat to see?

    A fascinating balancing act.

    Thumb up 0 Thumb down 0
  4. Marc Resnick says:

    There is another consequence of too much information. As long as information stays within manageable levels, it usually increases the quality of our decision making. But if it hits that threshold, we are more likely to ignore it and switch to a more intuitive and subjective decision making process.

    So you can decrease someone’s decision making quality by facing them with an intimidating mass of information.

    Thumb up 0 Thumb down 0
  5. Jake says:

    I love the concept: Reviews for sporting events. I would love this. I usually just want to watch the games though. Example: I am a Vikings fan. If they lose terribly, I might still want to watch the game, so that I know where they need to improve. If they win big, I’d also want to watch it. If its close I definitely want to watch it. And with DVR, I can cut about 1.5 hours out of a 3 hour game. That’s good enough for me.

    Thumb up 0 Thumb down 0
  6. Colin Gray says:

    Keeping it brief is not a new problem:

    “I am sorry to write such a long letter. I didn’t have time to write a short one. ”

    Variously attributed to Mark Twain, Voltaire, Proust, Pliny the Younger, T.S Eliot, Abraham Lincoln…

    Thumb up 0 Thumb down 0
  7. -Roy Blount, Jr. expounds beautifully says:

    The issue is that you can ONLY get a jar of syrup from all those sap-collecting saps but, by creating a “short section” limit to your fascinating research, you constrain yourself as a writer. Let an editor decide or perhaps your book should be two volumes.

    Thumb up 0 Thumb down 0
  8. misterb says:

    @charles(#2),

    I’m in the business you describe (finding faces in crowds, so to speak) . On the one hand, we still can’t process all the information we gather – on the other hand, your personal privacy has never come under greater threat. Without question, a modern-day Stalin could be far more effective at wiping out dissent and independent thought. We can’t turn back Moore’s law, but we can demand that our privacy be protected.

    The basis of the marginal utility equation for information has to be related to the cost of supplying it. Just as a tip about yesterday’s stock market is useless, information has a time value. Just like electrical current, it has a cost of transmission. Clearly if the cost of transmission exceeds the time value – the information should not only be ignored, but never transmitted. That’s why Evan Schumaker’s solution is so clever.

    Thumb up 0 Thumb down 0