SPUG: Word boundry regex treated differently by 5.6 and 5.005033

Colin Meyer cmeyer at helvella.org
Wed Apr 25 19:51:09 CDT 2001


Hey, Ben,

On Wed, Apr 25, 2001 at 12:35:04PM -0700, Ben Burnett wrote:
> Hey all,
> 
> I noticed something that seems strange.  I ran the following
> script on two machines.  One of them running 5.00503 and one
> running 5.6 (full details attached) and got two different
> outputs.
> 
> --<script>--
> #!/usr/bin/perl
> 
> use strict;
> 
> my $text = "Charles Bronson";
> 
> $text =~ s/\B\w//g;
> 
> print "here it is: $text\n\n";
> --</script>--
> 
> output on 5.6 was:	here it is: Cals Bosn
> output on 5.00503 was:	here it is: C B

output of 5.00405: here it is: C B
5.6.0:             here it is: Cals Bosn
5.6.1:             here it is: Cals Bosn
5.7.1:             here it is: Cals Bosn

The same effect can be seen from:
perl -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'

prior to 5.6.0 versions print:
2
3
4
5
6
7
while post 5.6.0 print:
2
4
6

More detail can be seen from the regex debugger:
perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'

Another interesting variety:
perl -le '$t="a b c d e f g";print pos $t while $t =~ m/\b./g'

> 
> I appears as though 5.00503 is getting rid of all \w
> characters in the string that aren't preceded by a word
> boundary(which is what I expected), while 5.6 is removing
> every other \w character in each word.
> I couldn't see anything obvious in perldelta that would
> indicate that the two versions should treat this
> differently.  Does anyone know why this might happen?

I think it is unfortunate that this difference in the interpretation of
regexes is not mentioned in perldelta.

It is hard for me to decide if this is a new bug or a bug fix for an old
problem. The camel says that /g causes the regex to "start the next
match on the same variable at a position *just past* where the last
match stopped." The older versions of Perl seem to be looking at the
character that the last match ended on in order to determine the border
or non-border properties of the character at pos($t). Well, it's either
a bug with Perl, or a bug with its documentation. In either case, a
report should be submitted with perlbug.

What sort of problem were you attempting to solve when you came across
this one? ;-)

-C.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list