[tpm] New Regex question

Sun Oct 28 04:43:15 PDT 2012

I don't think that's exactly correct ... the $1 on the match side appears
to refer to the current regex, not a previous regex.  or perhaps I've
misunderstood what you're saying?

[robj at rj-ul80vt ~]$ perl -e '$x = "one \t\t two"; print ".$1.\n"; print
"$x\n"; $x =~ s/(\s)$1+/$1/; print "$x\n";'
..
one          two
one two

compare with this ..

[robj at rj-ul80vt ~]$ perl -e '$x = "one \t\t two"; print ".$1.\n"; print
"$x\n"; $x =~ s/(\s)\s+/$1/; print "$x\n";'
..
one          two
one two

No g on the sub, $1 is set on the first pass, not the second.  It appears
to act as if it was a \s not the interpolated contents of the first match.

If the $1 was the contents of the parenthesized match, it would have been a
space and the space tab tab space would not have been replaced.  the match
would have occurred only for the two tabs, which would have been replaced
with one tab.  However, the $1 appears to act as if it was a \s, matching
any whitespace character.  the parenthesised match matches the space, and
the $1+ matches tab tab space.  On the replacement side, the $1 is just a
space.

looked over the perlre man page, didn't see anything about that.  i did
find some new stuff.  \g{1} instead of \1, to remove some ambiguity.  also,
named capture groups, (?<name>xxxx)

There was also the warning about using $1 and the like anywhere, causing an
overall slowdown.  that might be a reason to use \1 in the replacement,
although it's possible that a \1 in that context would also slow perl down.

also, I noticed that \1 in the replacement is "grandfathered" not
deprecated, not for backwards compatibility, but to avoid shocking sed
fans.  That means that \1 is not going away.  The warning section says it's
use is discouraged because of the ambiguity with other uses of \1.

This seems to explain what's happening:

>>> The operation of interpolation should not be confused with the
operation of matching a backreference.  Certainly they mean two different
things on the left side of the "s///".

The only reason I can think of for this behaviour of $1 in a match regex is
to replicate a pattern in the regex, perhaps with different modifiers.

-rob

On Sat, Oct 27, 2012 at 11:56 PM, Uri Guttman <uri at stemsystems.com> wrote:

> On 10/27/2012 08:52 PM, Rob Janes wrote:
>
>> sounds a bit muddy ...
>>
>> just to clarify, what I found to my surprise,  is that
>>
>> $exp =~ s/(\s)$1+/$1/g;
>>
>> replaces mixed strings of white space with the first character.  leading
>> me
>> to conclude that the first $1 actually is the \s, not the specific
>> character matched.  However, in the second part of the s, the $1 did
>> indeed
>> give the white space.  So the $1 in the context of searching appears to
>> render as the regex, not the characters matched.  While in the context of
>> replacing, the $1 renders as the actual characters matched, not the regex
>> used.
>>
>> hope that makes it clearer.
>>
>
> sorry but no.
>
> $1 in the regex is what $1 was BEFORE the regex. it gets interpolated and
> then parsed as a regex. it is not related to the () in the regex. only \1
> will be the grabbed text from that (). and $1 in the replacement is just
> the string matched in the () which is any single whitespace char.
>
> uri
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20121028/a668afdc/attachment-0001.html>