In this week's New York Times Magazine, Clive Thompson writes about people trying to win the Netflix Prize.
Early in 2007, Netflix, the video-rental-by-mail service, made (anonymized) data on its users preferences available, and is offering $1,000,000 to the first person or team that creates an algorithm for recommending movies that makes a certain substantial improvement upon the existing algorithms. More precisely, Netflix attempts to predict the star rating, on a one-to-five scale, that users will give to a movie; the root mean square error in their old algorithm was about 0.95 stars, and they'll pay the $1,000,000 for improvement of this by 10%. (They also pay $50,000 for improvements of 1%.)
Why are they doing this? Well, Netflix recommends movies to lots of people, and they want their recommendations to be good. And it sounds like they're getting a lot more than a million dollars worth of time from the various competitors working on this; if they actually had these people on their payroll it would cost more.
It's interesting that there are certain movies that are very difficult to predict; Napoleon Dynamite is one of the examples the article gives, which won't surprise anybody who's seen it; poking around the forums on the Netflixprize.com site I found a list of movies like that, although it's a year and a half old so there's been progress since then. (Hmm, I'm not sure if I like Napoleon Dynamite -- sometimes I do and sometimes I don't.)
Apparently one of the major mathematical tools that has emerged as being useful here is singular value decomposition. I'm not surprised that some Big Fancy Linear Algebra tool was necessary, as a lot of the algorithms essentially seem to work by identifying various dimensions along which movies can vary. (Look, it's a hidden metric space!) Hmm, I wonder if the space of movies has some nontrivial topology... no, I will not get sucked into thinking about this!