[Princeton-pm] regex stuff
Jeff 'japhy' Pinyan
japhy at perlmonk.org
Fri Sep 16 11:36:03 PDT 2005
On Sep 16, uber spaced said:
> Yeah, so yesterday (i think), Jeff wrote this regex on freenode #perl which
> took somebody several messages to decompose and translate to english some
> coaxing.
The regex in particular is one I wrote for someone on PerlMonks.org; given
a hash %dict with lowercase words in it that are words you recognize (such
as those from /usr/dict/words), the following regex takes a string of
run-together words (like "canyoudecipherwhatiswrittenhere") and produces
the split-apart words that form it:
$bunchofwords =~ m{
^ # anchor to beginning of string
(?{ [ ] }) # set $^R to []
(?: # match this chunk <<
(\w{2,}) # capture 2+ word chars into $1
(?(?{ $dict{lc $1} }) # if lc($1) is in the %dict hash:
(?{ [ @{$^R}, $1 ] }) # add $1 to the list of words found
| # otherwise:
(?!) # fail (causing \w{2,} to backtrack)
) # end of 'if-else' assertion
)+ # >> one or more times
$ # anchor to the end of the string
(?{ print "(@{$^R})\n" }) # print the words with spaces in between
(?!) # fail (causing (?:...)+ to backtrack)
}x;
This regex uses almost every esoteric assertion Perl can handle:
* (?{ CODE })
executes arbitrary code at a given point in a regex; return value is
automatically stored in $^R
* (?(COND)true-pat|false-pat)
if 'COND' is true, match /true-pat/, else match /false-pat/
* (?!pattern)
fail if /pattern/ can match at the current location in the string
The magic $^R variable is auto-localized during the regex. When an
assertion that MODIFIES $^R is backtracked over, $^R's value is rewound!
I have an article written which appeared in last summer's The Perl Journal
on these esoteric regex assertions. It's available here:
http://japhy.perlmonk.org/articles/tpj/2004-summer.pod
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart
More information about the Princeton-pm
mailing list