[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Additional information/help on the collaborative filtering problem



Several people had questions on the collaborative filtering formula
(largely because of the unfortunate fact that the christmas-toy
example seems to wind up exploiting every possible special extreme
case--because of sparse data).


I have several pieces of help and clarifications.


0. I would strongly suggest doing the problems 3 and 4 before doing
2.

(detail)1. In the formula for computing ratings, you have the
denominator factor which adds up all weights. The idea is to normalize
such that the weights all add up to 1.  Because the weights can
be negative, this normalization factor should be written in terms of
sum of _absolute values_ of weights (rather than raw signed values). The
numerator still takes  raw signed values.  (To see that this change
makes sense, consider a scenario where you have only two neighbors,
both of who are fully negatively correlated with you (w= -1 for both),
and they both ranked a particular item at +5 (and their ratings mean
is 0). You want your formula to say that your rating should be -5.

(Unfortunately, the paper also uses the wrong formula)


2. I linked in a much better paper that discusses collaborative filtering in more detail. Check it out:

http://citeseer.ist.psu.edu/breese98empirical.html


regards


Rao