SPUG: removing common words
Dean Hudson
dean at ero.com
Thu May 4 00:00:38 CDT 2000
On Wed, 3 May 2000, Christopher Cavnor wrote:
> Does anyone know of a module that can extract common words (aka "stop
> words") from a text file or scalar? Specifically, I want to parse
> something like:
>
> "The foo that foo's it's foo is likely to foo time and time again"
> to something like this -> "foo foo's foo likely foo time time again"
Here are a couple lists I found by searching for "stop words", "stop words
lists" on google:
http://www.library.csustan.edu/catalog/doc/oclc5.htm
http://www.access.gpo.gov/su_docs/dpos/stopword.html
http://www.cqs.washington.edu/crisp/lit/stop.html
The lists seem suprisingly short, so you could probably whip something up
that has basic functionality pretty quickly...
dean.
--
my $email = qr{ dean(h)?@(?(1)verio\.net # @ work if h
| ero\.com) }x; # other
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
SUBSCRIBE/UNSUBSCRIBE: Replace "action" below by subscribe or unsubscribe
Email to majordomo at pm.org: "action" spug-list your_address
More information about the spug-list
mailing list