# Algorithm Needed; \$25,000 Reward

It’s not quite the Netflix Prize — which paid \$1 million to whoever could improve that company’s Cinematch recommendation algorithm by 10 percent — but there’s a new competition designed to predict magazine sales at newsstands. It’s being run by Hearst and the Direct Marketing Association. Rules are here; the top prize is \$25,000. [%comments]

#### Ian Kemmish

\$25,000 sounds pretty paltry. I might consider it in return for the combined salaries of all the experienced old guys with centuries of collective experience whom I'd be putting out of work.

#### Max

Yeah, especially since Hearst can learn from all the applicants. A clever framework for solving a problem can be more valuable than a winning solution, and Hearst gets to keep all the applicants' submissions.

#### Nate C.

If you're not allowed to include any external data sources, aren't many contestants going to end up with the same answer?

Additionally, they're holding data back? That's like saying "tell me the seasonal batting averages of these players" and only providing the data where they played the Pittsburgh Pirates, then providing the Yankees and wondering why the model wasn't accurate.

#### Ryan

Nate, the question is whether they can predict future sales. In order to do this, they have to test the models against data. They can't simply test against the same data that was used to build the data because this would be akin to knowing the results of the future data, making the model itself unnecessary.

Also, presumably, the data provided will be a random sample of the full data. Your example assumes that they are going to hold back specific types of transactions whereas that would make little sense from a modeling perspective.

It is common practice to build models from a sample and test against another random sample. This is how you validate your model.

#### Matt

It took a 19 hour plane trip to India to convince me to buy a physical magazine (mainly because I cannot read or play on my Droid during take-off or landing). I'd suggest starting the algorithm with:

if (owns(smartPhone))
{
return false;
}

#### David

Nate (#3), it's about using the sources that you're given so that the solution to the problem can be replicated by the companies themselves. Additionally, data is withheld because the withheld data is used to check the algorithm's predictive accuracy. The same was done with the netflix prize.

#### MarkD

\$25,000? Really?

Netflix paid a ton of cash because the algorithm was *worth* that amount to them (not to mention the publicity value). I'm pretty sure Hearst and the DMA could scrape together double this from their couch cushions and lunch martini budget alone.

I hope they get a grand total of one applicant from a struggling grad student who scribbled some idea on a napkin and sent it in.

#### infopractical

The pie may be so small because this particular pie making business is a relic of an institution.

#### Meg

Online contests are very popular among scientific community of data miners, because they provide the most objective and fair comparison of different algorithms. Many such competitions are hosted in TunedIT.org - http://tunedit.org/challenge - for instance a recent contest funded by TomTom was designed to predict traffic jams for GPS navigation.