[Kc] meeting Tuesday Oct 10 2006
Matthew Wilson
matthew at veradox.com
Sat Oct 7 10:28:54 PDT 2006
> I too would like to have done a more "people with similar tastes"
> style ratings projection, but the data set of ~100 million ratings
> using 480,000 unique customer id's... doesn't lend itself very well to
> that approach.
>
> That and I somehow doubt that the people at Netflix are incompetent.
> I.e., they say that there are a lot of approaches that they aren't
> using. But that doesn't mean that they haven't explored and rejected
> those routes.
>
> Myself, I've dumped the training set data into a database. And am
> trying to figure out the right questions to ask...
>
> I do think someone will get the $50,000 progress prize. But I also
> doubt that it'll be me. I expect a team strong in mathematics and
> statistics will find a way to finesse small improvements on Cinematch.
> Enough to justify the progress bounty.
>
> I wish there was more documentation on the approach Cinematch takes
> and the alternative approaches. The progress bounty might be had by
> anyone who stumbles upon a better balance of approaches, but I doubt
> anyone will manage the million dollar prize without a novel approach.
>
> cheers,
>
> Garrett
On Cinematch's approach:
*Technology Review:* Before building a better recommendation system, it
would be useful to understand your current approach. How does Cinematch
work?
*Jim Bennett:* First, you collect 100 million user ratings for about
18,000 movies. Take any two movies and find the people who have rated
both of them. Then look to see if the people who rate one of the movies
highly rate the other one highly, if they liked one and not the other,
or if they didn't like either movie. Based on their ratings, Cinematch
sees whether there's a correlation between those people. Now, do this
for all possible pairs of 65,000 movies.
from http://www.technologyreview.com/read_article.aspx?id=17587&ch=biztech
I have a bundle of 30ish PDFs that thoroughly describe all publicly
known alternative approaches. It's taken me about 10 hours with Google
Scholar to get them all. Ask me for it offlist if you'd like a .zip of
them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2852 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.pm.org/pipermail/kc/attachments/20061007/fb8bdf8a/attachment.bin
More information about the kc
mailing list