[Pdx-pm] regexp and semi-greedy match
Eric Wilhelm
scratchcomputing at gmail.com
Sun Oct 14 23:48:55 PDT 2007
# from Keith Lofstrom
# on Sunday 14 October 2007 22:03:
>"Greedy" regexp is just a tiny bit too greedy. If I use a pattern
> match like:
>
> if( /(a-z0-9_.-)-(\d*)\.(raw)$/i ) { # this does NOT work
I don't think it is a greedy bug. The first group is literal. Are you
trying for a character class (needs square brackets) and why?
m/^(.*)-(\d+)\.raw$/
I'm also not sure about the capturing on "raw", which is a constant.
(Perhaps it is going to change and you want to capture anything which
is not-a-dot until the end: qr/-(\d+)\.([^.]+)$/ .)
Another trick in situations like this is to not bother capturing if you
happen to have a disposable copy of the scalar. Just whack the
interesting and/or messy bits off of the end.
my $num;
if($scalar =~ s/-(\d+)\.raw$//) {
$num = $1;
}
else {
die "didn't expect that input"; # or you could next
}
# $scalar is now just the base bit
Another note: "greedy" typically causes failed captures, not failed
matches. The greed comes into play when multiple .* (or similar) might
match in more than one way. The regexp engine resolves the ambiguity
by stuffing as much as possible into the first submatch (but curbs its
gluttony short of invalidating the entire match.)
In this case the \d* could cause the match to hit on a mal-formatted
string (and your $1 would get the whole string.) The \d+ and the
\.raw$ anchor things though (and cause the whole match to fail if
something went awry.)
--Eric
--
"Time flies like an arrow, but fruit flies like a banana."
--Groucho Marx
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------
More information about the Pdx-pm-list
mailing list