Thursday, December 13, 2007

BellKor ensemble for recommendations

The BellKor team won the progress prize in the Netflix recommender contest. They have published a few papers ([1] [2] [3]) on their ensemble approach that won the prize.

The first of those papers is particularly interesting for the quick feel of the do-whatever-it-takes method they used. Their solution consisted of tuning and "blending 107 individual results .... [using] linear regression."

This work is impressive and BellKor deserves kudos for winning the prize, but I have to say that I feel a little queasy reading this paper. It strikes me that this type of ensemble method is difficult to explain, hard to understand why it works, and likely will be subject to overfitting.

I suspect not only will it be difficult to know how to apply the results of this work to different recommendation problems, but also it even may require redoing most of the tuning effort put in so far by the team if we merely swap in a different sample of the Netflix rating data for our training set. That seems unsatisfying to me.

It probably is unsatisfying to Netflix as well. Participants may be overfitting to the strict letter of this contest. Netflix may find that the winning algorithm actually is quite poor at the task at hand -- recommending movies to Netflix customers -- because it is overoptimized to this particular contest data and the particular success metric of this contest.

In any case, you have to admire the Bellkor team's tenacity. The papers are worth a read, both for seeing what they did and for their thoughts on all the different techniques they tried.

1 comment:

Anonymous said...

With all respect to the winner, your post raises a related question. The differences in RMSE are small among the top submissions. One wonders whether their rank order would hold up across repeated test sets from the same distribution.