[Princeton-pm] regex stuff

Jeff 'japhy' Pinyan japhy at perlmonk.org
Fri Sep 16 11:36:03 PDT 2005


On Sep 16, uber spaced said:

> Yeah, so yesterday (i think), Jeff wrote this regex on freenode #perl which
> took somebody several messages to decompose and translate to english some
> coaxing.

The regex in particular is one I wrote for someone on PerlMonks.org; given 
a hash %dict with lowercase words in it that are words you recognize (such 
as those from /usr/dict/words), the following regex takes a string of 
run-together words (like "canyoudecipherwhatiswrittenhere") and produces 
the split-apart words that form it:

   $bunchofwords =~ m{
     ^                          # anchor to beginning of string
     (?{ [ ] })                 # set $^R to []
     (?:                        # match this chunk <<
       (\w{2,})                   # capture 2+ word chars into $1
       (?(?{ $dict{lc $1} })      # if lc($1) is in the %dict hash:
         (?{ [ @{$^R}, $1 ] })      # add $1 to the list of words found
         |                        # otherwise:
         (?!)                       # fail (causing \w{2,} to backtrack)
       )                          # end of 'if-else' assertion
     )+                         # >> one or more times
     $                          # anchor to the end of the string
     (?{ print "(@{$^R})\n" })  # print the words with spaces in between
     (?!)                       # fail (causing (?:...)+ to backtrack)
   }x;

This regex uses almost every esoteric assertion Perl can handle:

   * (?{ CODE })
     executes arbitrary code at a given point in a regex; return value is
     automatically stored in $^R
   * (?(COND)true-pat|false-pat)
     if 'COND' is true, match /true-pat/, else match /false-pat/
   * (?!pattern)
     fail if /pattern/ can match at the current location in the string

The magic $^R variable is auto-localized during the regex.  When an 
assertion that MODIFIES $^R is backtracked over, $^R's value is rewound!

I have an article written which appeared in last summer's The Perl Journal 
on these esoteric regex assertions.  It's available here:

   http://japhy.perlmonk.org/articles/tpj/2004-summer.pod
   http://japhy.perlmonk.org/articles/tpj/2004-summer.html

-- 
Jeff "japhy" Pinyan        %  How can we ever be the sold short or
RPI Acacia Brother #734    %  the cheated, we who for every service
http://www.perlmonks.org/  %  have long ago been overpaid?
http://princeton.pm.org/   %    -- Meister Eckhart


More information about the Princeton-pm mailing list