Subset regular expression substitution

Keary Suska hierophant at pcisys.net
Thu Feb 14 12:19:24 CST 2002


on 2/14/02 10:32 AM, ningersoll at cso.atmel.com purportedly said:

> Looks typically cryptic and probably called something odd.  Can you point
> me to where I might learn more about it?  I have most of the Perl books and
> this is the first time I've ever seen it.   I feel so naive!

The regex section in the Camel book covers the use of {} and the /e option.
"Mastering Regular Expressions" might be useful, but I don't recall off the
top of my head.

It's pretty straightforward. I'll break it down for you:

s|^(.{10})|do{ $str = $1; $str =~ s/[()]/ /g; $str }|e

First off, '|' is used as the expression delimiter so Perl doesn't
get confused over the '/' used in the substitution expression.

^(.{10})

The {} characters indicate a match on an arbitrary number of the preceding
expression, as opposed to say, '+', which means one or more. {10} means
match exactly 10 consecutive occurrences of the previous expression. I used
'.', which indicates any character, thus the whole match expression means
any 10 characters, anchored at the beginning ('^'), thus the first ten
characters. I enclose the expression in () because I want a call back for
the substitution expression.

do{ $str = $1; $str =~ s/[()]/ /g; $str }|e

The 'e' option tells Perl to interpret the substitution string as a Perl
expression. You can think of this as causing Perl to eval() whatever appears
there, and uses the result value as the substitution string.

do{
    $str = $1;
    $str =~ s/[()]/ /g;
    $str
}

The "do" may not be necessary, I don't recall. All it does is allow a block
of statements to be treated as a single expression. The Camel book discusses
this in the "Control Structures" section, IIRC. In this block I assign the
callback (the matched expression, which is the first 10 characters of the
string), to a variable, then substitute every occurrence of a paren with a
space. The trailing "$str" simply ensures that the result value of the block
expression will be $str. Same as "return $str".

Perl then substitutes the first 10 characters of the string with the value
of $str, which is the first 10 characters of the string with parens
converted to spaces.

For performance, you may want to consider using tr/// instead in the do{} if
you will be executing this expression many times, and possibly the /o option
for the whole expression, although I don't recall how that interacts with
the /e option.

Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"




More information about the Pikes-peak-pm mailing list