[LA.pm] HTML page word count/density module?

Peter Benjamin pete at peterbenjamin.com
Fri Feb 20 11:25:05 CST 2004


At 11:25 PM 2/19/2004, Kevin Scaldeferri wrote:
>>I was hoping to compare the output and methods used in
>>any sample code with mine, and make sure mine did not
>>miss any tricks.
>
>You might look at Nutch (www.nutch.org), which is an attempt to build an open-source web search engine.  I don't know if their indexing or scoring actually does anything like this, though.  Also, it's in Java.

I'm going to find out.  Even in Java it will let me compare algorithms.
Know one language, you know them all.  Except APL and ADA are so different.
Wish they were the standard languages of choice.  The world would have
error free computers that even talk.  IMHO.

>Other than that, I expect that anything that exists along these lines is likely to be proprietary.

That is what I have found the last few months.
No open source code projects even remotely
hinting at doing this, until you gave me one.
Maybe mine will be the first in CPAN, if I am
not too embarrassed by the code. 

Thanks Kevin.




More information about the Losangeles-pm mailing list