SPUG: Strange regex problem

Riley wormwood at speakeasy.org
Sat Sep 23 00:22:12 CDT 2000


Hi Trevor,

This problem you describe got to me, so I played around a little to 
figure out what it might be doing (tested on perl 5.005_03 on
i686-linux).

In general, when doing matching or substitution with the global
('g') option, perl sets $1, $2, $3... to the corresponding substring of
the last successfully matched pattern. 

However, it's kind of funny about just what it is that it considers the
last matched pattern: If it's matching in a scalar (or void) context, it
doesn't bother to look for matches after the first one -- basically
cancelling out the 'g'. Compare:

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
         $matches = ($foo =~ /([a-z]{3})/g); print "$1\n$matches\n"'   
 abc
 1

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
        ($matches) = ($foo =~ /([a-z]{3})/g); print "$1\n$matches\n"'
 vwx
 abc

Although it's a bit odd, this may be the specified behavior -- I have no
reference, but this is true of the version of perl 5.6 on Mac OS X beta as
well, maybe others.

If it's doing substitution, however, it doesn't use the context as an
excuse for (little 'l') laziness:

 perl -e'$foo ="abcdefghijklmnopqrstuvwxyz"; 
         $matches = ($foo =~ s/([a-z]{3})/\U$1\E/g); print "$foo\n$1\n$matches\n"'
 ABCDEFGHIJKLMNOPQRSTUVWXyz
 vwx
 8

In your error case, what seems to be happening is that it's
"remembering" the results of it's last *attempted* match, rather than it's
last successful one. Here's what I tried in the course of reaching this
conclusion:

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
        ($matches) = ($foo =~ s/(hi)(.*)(stu)/\U$1\E$2\U$3\E/gi); 
        print "$foo\n$1\n$matches\n"'
 abcdefgHIjklmnopqrSTUvwxyz
 
 1

The error case as previously described ... but what if we match against
any two letters rather than just "hi"?

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
        ($matches) = ($foo =~ s/(\w\w)(.*)(stu)/\U$1\E$2\U$3\E/gi); 
        print "$foo\n$1\n$matches\n"'
 ABcdefghijklmnopqrSTUvwxyz
 vw
 1

"vw" is right after "stu", so we can see that it began the
unsuccessful search at the right point -- but what if we move that point
back in the alphabet, to make sure it's indeed the last partial match
which is being returned. ("vw" is the first and the last!)

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
        ($matches) = ($foo =~ s/(\w\w)(.*)(nop)/\U$1\E$2\U$3\E/gi); 
        print "$foo\n$1\n$matches\n"'
 ABcdefghijklmNOPqrstuvwxyz
 vw
 1

Still "vw" -- the last two letters it could possibly match, since although
the second expression can match nothing, the third needs three characters
to look at. 

One last twist in this bug is that it only seems to happen when "*" and
the case-insensitive flag are used together. 

 perl -e'$foo = "abcdefghijklmnopqrstuvwxyz"; 
        ($matches) = ($foo =~ s/(hi)(.+)(stu)/\U$1\E$2\U$3\E/gi); 
        print "$foo\n$1\n$matches\n"'
 abcdefgHIjklmnopqrSTUvwxyz
 hi
 1

Beat that camel to death? It's fixed in later versions, thank god.

--Riley
----------------------
We run carelessly to the precipice, after we have put some thing before
us to prevent us seeing it. 
-- Blaise Pascal

Regarding "SPUG: Strange regex problem", Trevor Leffler wrote:

> SPUGers,
> 
>   I'm working with a substitution regex that correctly does a
> substitution, but the $1, $2, etc. that I expect are empty when I use
> the //i and //g modifiers together.  I am hoping that one of you may
> have seen this before and have a good explanation; why would they react
> to one another?  Please ignore that the //i and //g are pointless in my
> example, except for demonstrating this problem.  Here is an example of
> what I'm talking about (using the ever-so-cute abc regex):
> 
> # Just using //i here
> bash$ perl -e '$foo = "abcdefghijklmnopqrstuvwxyz"; $foo =~
> s/(hi)(.*)(stu)/$3$2$1/i; print "$1, $3!\n$foo\n"'
> hi, stu!
> abcdefgstujklmnopqrhivwxyz
> 
> # //g here, and still no problems with $1, $2, and $3
> bash$ perl -e '$foo = "abcdefghijklmnopqrstuvwxyz"; $foo =~
> s/(hi)(.*)(stu)/$3$2$1/g; print "$1, $3!\n$foo\n"'
> hi, stu!
> abcdefgstujklmnopqrhivwxyz
> 
> #  Whoops!  //gi is acting fishy...
> bash$ perl -e '$foo = "abcdefghijklmnopqrstuvwxyz"; $foo =~
> s/(hi)(.*)(stu)/$3$2$1/gi; print "$1, $3!\n$foo\n"'
> , !
> abcdefgstujklmnopqrhivwxyz
> 
> 
> Thanks for any light you can shed on this,
> -- 
> Trevor Leffler, Software Developer
> PETTT, University of Washington
> Box 353080, (206) 616-3406 FAX: (206) 616-2873
> 
> 
> bash$ perl -V
> Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
>   Platform:
>     osname=linux, osvers=2.2.5-22smp, archname=i386-linux
>     uname='linux porky.devel.redhat.com 2.2.5-22smp #1 smp wed jun 2
> 09:11:51 edt 1999 i686 unknown '
>     hint=recommended, useposix=true, d_sigaction=define
>     usethreads=undef useperlio=undef d_sfio=undef
>   Compiler:
>     cc='cc', optimize='-O2 -m486 -fno-strength-reduce',
> gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)   
> cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
>     ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
>     stdchar='char', d_stdstdio=undef, usevfork=false
>     intsize=4, longsize=4, ptrsize=4, doublesize=8
>     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>     alignbytes=4, usemymalloc=n, prototype=define
>   Linker and Libraries:
>     ld='cc', ldflags =' -L/usr/local/lib'
>     libpth=/usr/local/lib /lib /usr/lib
>     libs=-lnsl -ldl -lm -lc -lposix -lcrypt
>     libc=, so=so, useshrplib=false, libperl=libperl.a
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
>     cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
> 
> 
> Characteristics of this binary (from libperl): 
>   Built under linux
>   Compiled at Aug 10 2000 15:33:00
>   @INC:
>     /usr/lib/perl5/5.00503/i386-linux
>     /usr/lib/perl5/5.00503
>     /usr/lib/perl5/site_perl/5.005/i386-linux
>     /usr/lib/perl5/site_perl/5.005
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
>  For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
> 
> 




 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list