Phoenix.pm: Regex question

Doug Miles doug.miles at bpxinternet.com
Fri Feb 2 17:49:24 CST 2001


OK, I figured it out.  The first set of "()"s is $1 the second set is
$2.  $1 and $2 BOTH get returned each time regardless of a match.  Thus
the undefs.  Here's the code I used to figure it out:

#!/usr/bin/perl

my $string = qq(DailyNAV "Class C" nav);

while($string =~ /"([^"]+?)"|(\S+)/g)
{

  my $token1 = $1;
  my $token2 = $2;
  my $match_length = length($&);
  my $position = pos($string);

  print "$token1|$token2|$match_length|$position\n";

}

Here's the output:

|DailyNAV|8|8
Class C||9|18
|nav|3|22

You can see that either $1 or $2 is undef depending on which matches. 
Obvious in retrospect.  Oh well, I guess I learned something.  Thanks
for the grep.  I used it. :)

Scott Walters wrote:
> 
> Doug,
> 
> |DailyNAV|Class C|||nav        <-- output of your version
> ||DailyNAV| |Class C|| ||nav   <-- output of same regex used in split() on same string
> 
> Regex can match 0 character things, like with split //, $str;... but looking
> at the regex, nothing in there would match something 0 chracters long, so
> I don't know =)
> 
> Hmm. Tried a few other regexes that also "should work" and had no luck. Only
> thing I can think of is work around the mystery/problem:
> 
> my @tokens = grep { $_ ? 1 : 0 } $string  =~ /"([^"]+)"|(\S+)/g;
> 
> -scott
> 
> On Fri, 2 Feb 2001, Doug Miles wrote:
> 
> > This is probably something obvious, but I don't have my regex book with
> > me, and can't seem to figure it out.  I'm trying to parse space
> > delimited information, somewhat like the UNIX command line.  Whitespace
> > delimits parameters, but parameters can containg whitespace if
> > surrounded by quotes.  Here's the code:
> >
> > #!/usr/bin/perl
> >
> > my $string = qq(DailyNAV "Class C" nav);
> >
> > my @tokens = $string =~ /(\S+)/g;
> > my @tokens = $string =~ /"([^"]+)"/g;
> 
> > my @tokens = $string =~ /"([^"]+)"|(\S+)/g;
> > print join('|', @tokens) . "\n";
> >
> > here is the output:
> >
> > DailyNAV|"Class|C"|nav
> > Class C
> > |DailyNAV|Class C|||nav
> >
> > The first two regexes do what I expect.  When I combine them in the
> > third, I get extra "null matches".  Any ideas as to what I'm doing
> > wrong?
> >
> > --
> > - Doug
> >
> > Encrypted with ROT-26 - all attempts to decrypt are illegal under the
> > DMCA!
> >

-- 
- Doug

Encrypted with ROT-26 - all attempts to decrypt are illegal under the
DMCA!



More information about the Phoenix-pm mailing list