[Brisbane-pm] Regex Syntax

Thu Apr 12 15:01:43 PDT 2007

On 12/04/2007, at 6:07 PM, Martin Jacobs wrote:

> Hi folks,
>
> I've got a clumsy way to do regexes, and I'm looking for a better way.
>
> I need to read the data from a file (called $name), which has the  
> following format...
>
>    Date   Rainfall mm
> 01/01/2000 00:00:00          27.333
> 01/01/2000 00:05:00         0.0
> 01/01/2000 02:00:00        29.15
> 01/01/2000 02:05:30        0.0
>

Hmm, you could just use split() first on the while space, then on  
the /s and :s

> Which cuts of whatever newline character there is on each line,  
> followed by
>
> 	for $i (1..$#file){
> 	$file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d. 
> \d*)|) {
> 		($day,$month,$year,$hour,$minute,$second,$rain) = ($1,$2,$3,$4,$5, 
> $6,$7);
> 		}
> 		}

You make an index counter, but don't really need it. How about:
Also note that if your match *fails*, those number variables will  
still contain their last successful match. This is probably bad.

   for my $line ( @file ) {
      my @values = m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d. 
\d*)|);
   }

> To accommodate some variations in the input record, I have expanded  
> this to
>
> 	for $i (1..$#file){
> 	if ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d. 
> \d*)|) {
> 		($day,$month,$year,$hour,$minute,$second,$rain) = ($1,$2,$3,$4,$5, 
> $6,$7);
> 		}
> 	elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+)\s*(\d. 
> \d*)|) {
> 		($day,$month,$year,$hour,$minute,$rain) = ($1,$2,$3,$4,$5,$6);
> 		$second = 0;}
> 	elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+)\s*(\d.\d*)|) {
> 		($day,$month,$year,$hour,$rain) = ($1,$2,$3,$4,$5);
> 		($second,$minute) = (0,0);
> 		}	
> 	elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s*(\d.\d*)|) {
> 		($day,$month,$year,$rain) = ($1,$2,$3,$4);
> 		($second,$minute,$hour) = (0,0,0);
> 		}	
> 	else {print_to [$Screen,$summary], "
> PERRMOSS cannot read rainfall data file $name near line $i
> PERRMOSS aborted at: \t\t$fulltime\n\n";
> 		exit;}
> 		}
>

Testing pattern matches is good - since you never assign values from  
number variables unless there's really been a match.

>
>
> The problem is that the first value of $rain should equal 27.333,  
> but it equals 27. So, there's a syntax issue, and i would be  
> grateful for any hints.
>

It's the pattern. To match that part of the string, you have \d.\d*
This matches: 1 digit, followed by one "any character", followed by  
zero or more digits. 27.333 does not match this, but 27 does.
You probably wanted \d+\.?\d* or somthing like it.

> In terms of the bigger picture, is using $1,$2 etc the best way to  
> do it?

I suggest never to use them unless you have tested for a successful  
match - though you are doing that here (just getting an incorrect  
match) Directly assignng the matches like I have done is okay,  
because it doesn't return values unless there was a match. Of course,  
you then need to test whether you got the values you were expecting :)

In your case though I'd just use split()

   my ($date, $time, $rain) = split '\w+', $line;
   my  ($year, $month, $day = split '/', $date;
   my ($hour, $minute, $second) = split ':', $time;

Though I'd use a hash to hold al those names - then push a referece  
to it onto an array:

open FILE, $name or die $!;
my @records;
for my $line (<FLIE>) {
    my %temp;
    $temp{ qw/ date time    rain /) = split '\w+', $line;
    $temp{ qw/ year month day /) = split '/', $line;
    $temp{ qw/ hour minute second /) = split ':', $line;
   push $records, \%temp
}
close FILE or die $!;

Then for any given record $i, $records[$i]->{rain} has the $rain value.

Cheers,
Damian