SPUG: Word boundry regex treated differently by 5.6 and 5.005033

Michael LaGaly lagaly at eskimo.com
Thu Apr 26 11:54:56 CDT 2001


Actually, it looks like the 5.6 is not returning to the original $text="Charles Bronson" as it tests each successive character, but is instead doing something that amounts to in-place deletion and then testing again.

So for:

perl -e "use strict;(my $text = qq(Charles Bronson)) =~ s/\B\w//g;print qq(here it is: $text\n\n);"

Charles Bronson
^                test : is on a word boundary: go to next
 ^               test : is not on a word boundary: delete this char
C arles Bronson  test : is on a word boundary: go to next
  ^

Why don't you try the following in 5.6.  This will show you what the text left of the match, the match, and the text right of the match are as the compiler sees it:

perl -le "$t = qq(Charles Bronson); print qq( <$`> $& <$'>) while $t =~ m/\B\w/g"

On 5.00503 this gets:
 <C> h <arles Bronson>
 <Ch> a <rles Bronson>
 <Cha> r <les Bronson>
 <Char> l <es Bronson>
 <Charl> e <s Bronson>
 <Charle> s < Bronson>
 <Charles B> r <onson>
 <Charles Br> o <nson>
 <Charles Bro> n <son>
 <Charles Bron> s <on>
 <Charles Brons> o <n>
 <Charles Bronso> n <>

I'm curious to see what 5.6 gives you.

Michael

  ----- Original Message ----- 
  From: Ben Burnett 
  To: Colin Meyer ; Ben Burnett 
  Cc: spug-list at pm.org 
  Sent: Thursday, April 26, 2001 12:09 AM
  Subject: Re: SPUG: Word boundry regex treated differently by 5.6 and 5.005033


  At 05:51 PM 4/25/01 -0700, Colin Meyer wrote:
  >More detail can be seen from the regex debugger:
  >perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'

  I have to admin I haven't spent much time with the perl debugger I'll take 
  a closer look at this.

  >It is hard for me to decide if this is a new bug or a bug fix for an old
  >problem. The camel says that /g causes the regex to "start the next
  >match on the same variable at a position *just past* where the last
  >match stopped." The older versions of Perl seem to be looking at the
  >character that the last match ended on in order to determine the border
  >or non-border properties of the character at pos($t). Well, it's either
  >a bug with Perl, or a bug with its documentation. In either case, a
  >report should be submitted with perlbug.

  I think it's probably a bug with Perl itself.  I can't imagine this change 
  in behavior was intentional.  I'll have to submit it in the morning.

  >What sort of problem were you attempting to solve when you came across
  >this one? ;-)

  Here is an excerpt of code showing the regex hard at work in a motorcycle 
  rental application CGI script.
  ...
                   # we need to give this request a registration number while 
  we are here.  this number
                   # will be built out of the initials of each word in the 
  applicants name, a unique session_key,
                   # the applicants state, and the first two letters of the 
  city that the applicant is in
                           my $key = time();
                           $key .= "-" . getppid() or $LogH->append("couldn't 
  getppid to add to session key");
                           my $request_id = $PASSED_VARS{'name'};
                           $request_id =~ s/\B\w//g;
                           $request_id =~ s/\W//g;
                           $request_id .= "-" . $key; # . "-";
                           # $request_id .= $PASSED_VARS{'state'} . "-";
                           # my $city_portion = $PASSED_VARS{'city'};
                           # $city_portion =~ m/^([\w]{2})/;
                           # $city_portion = $1;
                           # $request_id .= $city_portion ;
                           $request_id = uc($request_id);
  ...

  I'll eventually work out some other form of unique id for these requests 
  that isn't so verbose, but I wanted it to be human readable during testing.


  -Ben


   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
       POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
        Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
    Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
   For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
    Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/archives/spug-list/attachments/20010426/06d99058/attachment.htm


More information about the spug-list mailing list