SPUG: removing common words

Andrew Sweger andy at n2h2.com
Wed May 3 22:05:16 CDT 2000


On May 3, 2000 @ 7:55pm, Christopher Cavnor wrote:

> Does anyone know of a module that can extract common words (aka "stop
> words") from a text file or scalar? Specifically, I want to parse
> something like:
> 
> "The foo that foo's it's foo is likely to foo time and time again"  
> to something like this -> "foo foo's foo likely foo time time again"
> 
> I searched CPAN, and was amazed not to find such a simple mod. Yes, I
> can wrote it myself - but it might take me more time than I want to
> invest to figure a nice breadth of stop words.

Take a look at the Lingua:: family. You may need to adapt something to
suite your needs. I'm not really sure I understand the purpose of what
you're trying to do. Any chance you're trying to compress text for a
pager?

-- 
 Andrew Sweger <andy at n2h2.com>   |  N2H2, Incorporated
 Systems Architect               |  900 Fourth Avenue, Suite 3400
 Advanced Technologies Division  |  Seattle WA 98164-1059
 v=206.336.2947  f=206.336.1541  |  http://www.n2h2.com/


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace "action" below by subscribe or unsubscribe
           Email to majordomo at pm.org: "action" spug-list your_address





More information about the spug-list mailing list