[Phoenix-pm] Fwd: Named captures.

Scott Walters scott at illogics.org
Fri Feb 9 02:53:37 PST 2007

More forthcoming regex features in 5.10...



----- Forwarded message from Abigail <abigail at abigail.be> -----

Date: Fri, 9 Feb 2007 11:52:37 +0100
From: Abigail <abigail at abigail.be>
To: demerphq <demerphq at gmail.com>
Cc: perl5-porters at perl.org
Subject: Named captures.
User-Agent: Mutt/1.3.28i

If a regular expression contains two named captures with the same
name, $+ {NAME} returns the leftmost *defined* capture. I'm not
sure how useful that is - I think I'd prefer the lefmost capture, 
whether defined or not.

Consider the following code:

    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1.2 3.4" =~ /$re $re/) {
        print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";

This prints "1 2", as expected. But if we change it to:

    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
        print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";

it prints "1 4", getting something from the first $re and something from 
the second. I would have expected "1 UNDEF". 

Now, my guess is that returning the leftmost defined capture is useful
in cases like:


But you could make use of (?|) then:


That is, if you have the same NAME repeated, return the leftmost capture
regardless whether defined or not, and use (?| ) if you want the leftmost
defined one.

Alternatively, leave it as is, and have a way of getting to all the named
captures, if if they share the same name. For instance after:

    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
        print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";

    $+ {"integer"}    eq '1'    # Leftmost defined.
    $+ {"fraction"}   eq '4'    # Leftmost defined.
    $+ {"integer.1"}  eq '1'    # First capture of 'integer'
    $+ {"fraction.1"} eq undef  # First capture of 'fraction'
    $+ {"integer.2"}  eq '3'    # Second capture of 'integer'
    $+ {"fraction.2"} eq '4'    # Second capture of 'fraction'

Another issue, the NAME of named captures have similar constraints on
the name as identifiers - except that you cannot use '::' inside them.
That's a pity because with '::' there would be an obvious way of using
name spaces in your NAMEs. 


----- End forwarded message -----

More information about the Phoenix-pm mailing list