[Chicago-talk] Regular expression discussion.

Wed Feb 2 06:11:44 PST 2011

Hi Rich,

A soundex search is commonly used to match words that could be
misspelled. However, it targets more phonetic spelling errors instead
of OCR errors.

Google 'perl regex fuzzy search' and check out these two modules:

String::Approx
Regex::Approx

Cheers,
Warren

On Wed, Feb 2, 2011 at 7:55 AM, Richard Reina <richard at rushlogistics.com> wrote:
> Tired of shoveling snow. Well sit right down and lets have a regex discussion. I have a perl script that at the moment just uses grep to look though text files that have been converted from pdf2text to see what sort of documents they are.  What I am finding however is that a lot of searches fail by just a few characters.
> For example, if I am looking for "This first document is a contract between" the text string in the file might look like this
> "This tirst document is a coniract betweeo" and the grep search fails. However, as you can see these two statements are 93% alike.  Is there a way with perl regular expressions to match strings that are say 90, 95 or 98% alike?
>
> Any ideas would be greatly appreciated.
>
> Stay Warm!
> --
> Richard Reina
> Rush Logistics, Inc.
> Watch our 3 minute movie:
> http://www.rushlogistics.com/movie
>
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>