eView: Why the Netflix Prize Is a Good Start to Personalized Recommendations
The winner would be the first person to develop an algorithm capable of predicting user ratings of films at least 10 percent more accurately than Netflix’s internally developed Cinematch algorithm. Three years and 43,000 entries later — from more than 5,100 teams in 185 countries — a winner appears to have emerged. The winner will be officially announced this month, and it’s down to two teams.
Staking their claims
The data set provided to entrants was simple: it consisted of 100 million movie ratings given by 480,000 users to 18,000 movies over a seven-year period. Some users rated hundreds or even thousands of films; others rated just a handful.
For the movies, Netflix provided only the title and year of release. There was no information about stars, type of film (sci-fi, romance, comedy, etc.), director or Motion Picture Association of America rating.
Nonetheless, the large volume of data, and the fact that it came from a real live consumer-facing business, was incredibly exciting to computer science researchers. And the chance of winning $1 million probably didn’t hurt, either.
Thousands of teams entered to stake their claims to the prize. Some were teams of machine-learning Ph.D.s from prestigious institutions, while others were hobbyists working out of their basements. Press and blog coverage of the competition elevated some of the contestants into geek superstars.
The wrong direction
Much of the coverage, however, failed to fully understand how constrained the competition was. Consider how it compares to commerce in the offline world. If you walk into your local video store and ask a clerk for a recommendation, most likely the clerk would ask you what kind of films you like. If you like romantic comedies, he might recommend Hugh Grant’s latest opus. If you like action, "Transformers" would be the recommendation for you.
Netflix, however, didn't provide contestants the kind of information that customers normally would give a video store clerk. Initially, contestants had to construct recommendations, or “mine” them, from the simple raw data. The teams employed a variety of mathematical techniques to identify sets of films that appeared to be similar.
If an algorithm could determine that films A, B, C, D and E are very similar, for example, then someone who rated A, C and D highly would probably also like B and E. Soon, entrants began pulling data from other sources, like the IMDb website, to help identify other attributes of the films that could prove useful.
On June 26, a team of researchers calling themselves BellKor’s Pragmatic Chaos made the first claim of winning the $1 million prize with a 10.05 percent improvement over Cinematch. This triggered a 30-day window in which other teams could take a final shot at the prize. On July 25, just as the window was about to close, a team called The Ensemble submitted an entry with a 10.09 percent improvement. The contest is now closed, and judges are doing their final evaluations of the two potential winners.
In search of relevant recommendations
The entrants, particularly the stronger teams, have made tremendous advances in their ability to mine very subtle relationships out of the data. However, their advances are still insufficient to deliver the holy grail: relevant, high-quality recommendations.
Most retailers have far more data about their products in their catalogs than what Netflix gave contestants. This data — the top sellers in each category, color and size of a shirt, or a customer’s historical purchase behavior — can be used to make even more accurate recommendations. Beyond catalog data, continuous feedback about how people react to recommended products (ignore, click or buy) can be invaluable in improving the quality of recommendations.
Real-time context is also critical. If a user who normally prefers historical biopics is browsing in the action and adventure category, then that critical bit of context can make a tremendous difference in the quality of the recommendations in that moment.
The Netflix Prize winners, whomever they may be, will be remembered as valuable contributors to the field of recommendations. But to achieve what customers really want — the kind of relevant recommendations they’d get from a clerk or trusted friend — requires a deeper analysis at the right contextual inputs and real-time feedback loops.
Darren Erik Vengroff is the chief scientist at richrelevance, a San Franciso-based provider of personalization and product recommendation tools for enterprise-class e-commerce sites. Reach Darren at firstname.lastname@example.org.