[Kc] meeting Tuesday Oct 10 2006

Matthew Wilson matthew at veradox.com
Sat Oct 7 10:28:54 PDT 2006


> I too would like to have done a more "people with similar tastes" 
> style ratings projection, but the data set of ~100 million ratings 
> using 480,000 unique customer id's... doesn't lend itself very well to 
> that approach.
>
> That and I somehow doubt that the people at Netflix are incompetent. 
> I.e., they say that there are a lot of approaches that they aren't 
> using. But that doesn't mean that they haven't explored and rejected 
> those routes.
>
> Myself, I've dumped the training set data into a database. And am 
> trying to figure out the right questions to ask...
>
> I do think someone will get the $50,000 progress prize. But I also 
> doubt that it'll be me. I expect a team strong in mathematics and 
> statistics will find a way to finesse small improvements on Cinematch. 
> Enough to justify the progress bounty.
>
> I wish there was more documentation on the approach Cinematch takes 
> and the alternative approaches. The progress bounty might be had by 
> anyone who stumbles upon a better balance of approaches, but I doubt 
> anyone will manage the million dollar prize without a novel approach.
>
> cheers,
>
> Garrett

On Cinematch's approach:

*Technology Review:* Before building a better recommendation system, it 
would be useful to understand your current approach. How does Cinematch 
work?

*Jim Bennett:* First, you collect 100 million user ratings for about 
18,000 movies. Take any two movies and find the people who have rated 
both of them. Then look to see if the people who rate one of the movies 
highly rate the other one highly, if they liked one and not the other, 
or if they didn't like either movie. Based on their ratings, Cinematch 
sees whether there's a correlation between those people. Now, do this 
for all possible pairs of 65,000 movies.

from http://www.technologyreview.com/read_article.aspx?id=17587&ch=biztech

I have a bundle of 30ish PDFs that thoroughly describe all publicly 
known alternative approaches.  It's taken me about 10 hours with Google 
Scholar to get them all.  Ask me for it offlist if you'd like a .zip of 
them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2852 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.pm.org/pipermail/kc/attachments/20061007/fb8bdf8a/attachment.bin 


More information about the kc mailing list