[Phoenix-pm] Fwd: Named captures.
Scott Walters
scott at illogics.org
Fri Feb 9 02:53:37 PST 2007
More forthcoming regex features in 5.10...
Enjoy!
-scott
----- Forwarded message from Abigail <abigail at abigail.be> -----
Date: Fri, 9 Feb 2007 11:52:37 +0100
From: Abigail <abigail at abigail.be>
To: demerphq <demerphq at gmail.com>
Cc: perl5-porters at perl.org
Subject: Named captures.
User-Agent: Mutt/1.3.28i
If a regular expression contains two named captures with the same
name, $+ {NAME} returns the leftmost *defined* capture. I'm not
sure how useful that is - I think I'd prefer the lefmost capture,
whether defined or not.
Consider the following code:
my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
if ("1.2 3.4" =~ /$re $re/) {
print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
}
This prints "1 2", as expected. But if we change it to:
my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
if ("1 3.4" =~ /$re $re/) {
print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
}
it prints "1 4", getting something from the first $re and something from
the second. I would have expected "1 UNDEF".
Now, my guess is that returning the leftmost defined capture is useful
in cases like:
/(?<foo>PAT1)bar|(?<foo>PAT2)baz/
But you could make use of (?|) then:
/(?|(?<foo>PAT1)bar|(?<foo>PAT2))/
That is, if you have the same NAME repeated, return the leftmost capture
regardless whether defined or not, and use (?| ) if you want the leftmost
defined one.
Alternatively, leave it as is, and have a way of getting to all the named
captures, if if they share the same name. For instance after:
my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
if ("1 3.4" =~ /$re $re/) {
print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
}
$+ {"integer"} eq '1' # Leftmost defined.
$+ {"fraction"} eq '4' # Leftmost defined.
$+ {"integer.1"} eq '1' # First capture of 'integer'
$+ {"fraction.1"} eq undef # First capture of 'fraction'
$+ {"integer.2"} eq '3' # Second capture of 'integer'
$+ {"fraction.2"} eq '4' # Second capture of 'fraction'
Another issue, the NAME of named captures have similar constraints on
the name as identifiers - except that you cannot use '::' inside them.
That's a pity because with '::' there would be an obvious way of using
name spaces in your NAMEs.
Abigail
----- End forwarded message -----
More information about the Phoenix-pm
mailing list