SPUG: RE question

Jacinta Richardson jarich at perltraining.com.au
Wed Nov 16 17:55:44 PST 2005


Duane Blanchard wrote:

> $RE_year = "(19|20)\d\d";
> $RE_month = "(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)";
> $RE_day = "[0-3]?\d";

Separating things out like this is great.  However, have you considered that
your parens in the above statements will affect $1, $2 etc?  You probably mean
to be using non-capturing parens:

my $RE_year = qr/(?:19|20)\d\d/;
my $RE_month =
qr/(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|aug(?:ust)?|sep(?:tember)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)/i;
my $RE_day = qr/[0-3]?\d/;


> @array = ("1993 Mar  3", "1993  Mar 15", "Mar 15, 1993", "15 Mar
> 2001", "2001, 15 Mar");
> 
> foreach $thing (@array)
> {
> 	# in the first disjunction, find any one of the defined REs, in the
> second, find any but the first one you found, etc.
> 	if ($line =~ /($RE_year|$RE_month|$RE_day),?\s*([^$1]($RE_year|$RE_month|$RE_day)),?\s*([^$1$2]($RE_year|$RE_month|$RE_day)))
> 	{print "You got a date: too bad it isn't with a girl.";}

As you've determined, this isn't going to do what you want it to.  In the case
of "2001, 15 Mar" your pattern says:

        ((19|20)\d\d),?\s* 	# so, far so good, matches "2001, "
        ([^201]([0-3]?\d))      # oops, need something which isn't a 1.
                                # backtrack, match the space in this char class
                                # and then 1 with that second char class.  The
                                # \d then matches the 5.
        ,?\s*                   # matches okay: "2001, 15 "
        ([^2019]((mar(ch)?))    # hmm, okay, match the 'm' in the first char
                                # class, fail to find 'ar' in the options.
                                # backtrack and give the space to the char
                                # class, match "mar"
                                # pattern should match.


I expect you'll find it easier to handle each configuration separately.  It also
makes it a little easier to read your code.

my $RE_YMD = qr{$RE_year  [,/-]? \s* $RE_month  [/-]? \s* $RE_day}x;
my $RE_MDY = qr{$RE_month \s+        $RE_day    ,? \s+    $RE_year}x;
my $RE_DMY = qr{$RE_day   \s+        $RE_month  \s+       $RE_year}x;
my $RE_YDM = qr{$RE_year  ,? \s*     $RE_day    \s+       $RE_month}x;


Putting this all together should make all of the following examples work correctly.

my @array = ("1993 Mar  3", "1993  Mar 15", "Mar 15, 1993", "15 Mar 2001",
"2001, 15 Mar", "1993-Jan-31", "Jan 1 2000", "26 Jan 1988", "1976 01 Aug");

foreach my $date (@array)
{
        if($date =~ m/($RE_YMD|$RE_MDY|$RE_DMY|$RE_YDM)/ix) {
                print "$1 Matched!\n";
        }
        else {
                "$date failed\n";
        }
}

Hope this helps.

	Jacinta

-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |




More information about the spug-list mailing list