Subset regular expression substitution
Keary Suska
hierophant at pcisys.net
Thu Feb 14 12:19:24 CST 2002
on 2/14/02 10:32 AM, ningersoll at cso.atmel.com purportedly said:
> Looks typically cryptic and probably called something odd. Can you point
> me to where I might learn more about it? I have most of the Perl books and
> this is the first time I've ever seen it. I feel so naive!
The regex section in the Camel book covers the use of {} and the /e option.
"Mastering Regular Expressions" might be useful, but I don't recall off the
top of my head.
It's pretty straightforward. I'll break it down for you:
s|^(.{10})|do{ $str = $1; $str =~ s/[()]/ /g; $str }|e
First off, '|' is used as the expression delimiter so Perl doesn't
get confused over the '/' used in the substitution expression.
^(.{10})
The {} characters indicate a match on an arbitrary number of the preceding
expression, as opposed to say, '+', which means one or more. {10} means
match exactly 10 consecutive occurrences of the previous expression. I used
'.', which indicates any character, thus the whole match expression means
any 10 characters, anchored at the beginning ('^'), thus the first ten
characters. I enclose the expression in () because I want a call back for
the substitution expression.
do{ $str = $1; $str =~ s/[()]/ /g; $str }|e
The 'e' option tells Perl to interpret the substitution string as a Perl
expression. You can think of this as causing Perl to eval() whatever appears
there, and uses the result value as the substitution string.
do{
$str = $1;
$str =~ s/[()]/ /g;
$str
}
The "do" may not be necessary, I don't recall. All it does is allow a block
of statements to be treated as a single expression. The Camel book discusses
this in the "Control Structures" section, IIRC. In this block I assign the
callback (the matched expression, which is the first 10 characters of the
string), to a variable, then substitute every occurrence of a paren with a
space. The trailing "$str" simply ensures that the result value of the block
expression will be $str. Same as "return $str".
Perl then substitutes the first 10 characters of the string with the value
of $str, which is the first 10 characters of the string with parens
converted to spaces.
For performance, you may want to consider using tr/// instead in the do{} if
you will be executing this expression many times, and possibly the /o option
for the whole expression, although I don't recall how that interacts with
the /e option.
Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"
More information about the Pikes-peak-pm
mailing list