SPUG: removing common words
Andrew Sweger
andy at n2h2.com
Wed May 3 22:05:16 CDT 2000
On May 3, 2000 @ 7:55pm, Christopher Cavnor wrote:
> Does anyone know of a module that can extract common words (aka "stop
> words") from a text file or scalar? Specifically, I want to parse
> something like:
>
> "The foo that foo's it's foo is likely to foo time and time again"
> to something like this -> "foo foo's foo likely foo time time again"
>
> I searched CPAN, and was amazed not to find such a simple mod. Yes, I
> can wrote it myself - but it might take me more time than I want to
> invest to figure a nice breadth of stop words.
Take a look at the Lingua:: family. You may need to adapt something to
suite your needs. I'm not really sure I understand the purpose of what
you're trying to do. Any chance you're trying to compress text for a
pager?
--
Andrew Sweger <andy at n2h2.com> | N2H2, Incorporated
Systems Architect | 900 Fourth Avenue, Suite 3400
Advanced Technologies Division | Seattle WA 98164-1059
v=206.336.2947 f=206.336.1541 | http://www.n2h2.com/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
SUBSCRIBE/UNSUBSCRIBE: Replace "action" below by subscribe or unsubscribe
Email to majordomo at pm.org: "action" spug-list your_address
More information about the spug-list
mailing list