<div dir="ltr"><div>First, I really enjoyed last night. I learned a lot of really cool things. If you think what you don't have to say is of no interest, think again :)</div><div><br></div>Now, here is a more sophisticated method for determining the similarity between any 2 give documents.  In the case of the script, I comparing a sampling of eBay item titles. It is taken directly out of Section 5.7 of Practical Text Mining With Perl. I just cleaned it up and modified it for my purposes.<div><br></div><div>The result is a square matrix ( MxM given M documents) that relates all "documents" to the other, the final value is a measure of similarity for 1 (exact) to 0.</div><div><br></div><div><a href="https://github.com/estrabd/lightning-talks/tree/master/houston-pm-13-nov-2014-text-mining">https://github.com/estrabd/lightning-talks/tree/master/houston-pm-13-nov-2014-text-mining</a><br></div><div><br></div><div>I forgot to mention last night that the method uses what is called a "bag of words" model - meaning that word order doesn't matter.  Word order may be considered using "n-grams" - or strings of ordered words, and I imagine the the same method may apply - it just greatly increases the number of entries in each document vector.</div><div><br></div><div>There's a lot to this book, so maybe I'll have something interesting the next time we do another round of these talks.</div><div><br></div><div>Brett</div><div><br></div></div>