[sf-perl] RE oddity

Joseph Brenner doom at kzsu.stanford.edu
Fri Feb 17 12:39:06 PST 2006


Rich Morin <rdm at cfcl.com> wrote:

> I'm a big fan of extended regular expressions, but I just
> wrote one that didn't work as I expected.  This code:
> 
>   $line =~ s|[\000-\010                   # nul-bs
>               \012-\037                   # nl-us
>               \177-\377']                 # del-... and '
>             ||gx;                         # Punt weird characters.
> 
> produced the nastygram:
> 
>   Invalid [] range "l-b" in regex;
>   marked by <-- HERE in m/[\000-\010              # nul-b <-- HERE s
>                       \012-\037                   # nl-us
>                       \177-\377']                 # del-...
>                     / at /home/rdm/bin/log_load.pl line 206.
> 
> but this code:
> 
>   $line =~ s|[\000-\010\012-\037\177-\377']||g;
> 
> sails right through.  Is this a bug or a (mis-)feature?

man perlre:

    The "/x" modifier itself needs a little more explanation.  It tells the
    regular expression parser to ignore whitespace that is neither back-
    slashed nor within a character class.  

It's documented.  It's a feature. 

The gotcha I usually get stung on is assuming that /x does something to 
the right hand side of a s///x:

   s{ ^ (.*?)    # capture first word to $1
         /s      # seperated by a space
        (.*?) $  # capture second word to $2
     }{$1 $2}x

That'll remove a space from between two items and then put it right back
again.  (Of course, if you were trying to convert tabs to spaces, then 
this could be useful). 



More information about the SanFrancisco-pm mailing list