[Brisbane-pm] Regex Syntax

Jacinta Richardson jarich at perltraining.com.au
Thu Apr 12 18:54:37 PDT 2007


Damian James wrote:

>>    Date   Rainfall mm
>> 01/01/2000 00:00:00          27.333
>> 01/01/2000 00:05:00         0.0
>> 01/01/2000 02:00:00        29.15
>> 01/01/2000 02:05:30        0.0
>>
> 
> Hmm, you could just use split() first on the while space, then on  
> the /s and :s

As Damian has said, split() is definitely the way to go.  It's going to be much 
less error prone and very easy for everyone to understand.  My solution is 
slightly different from Damian's (untested, but should be close):

open FILE, "<", $name or die $!;

my @records;
while(<FILE>) {
	my ($date, $time, $rainfall) = split (/ /, $_);

	my ($day, $month, $year) = split("/", $date);

	# Sometimes we're not given time at all
	my ($hours, $min, $sec) = (0,0,0);
	if( !$rainfall ) {
		$rainfall = $time;
	}
	else {
		($hours, $min, $sec)  = split(":", $time);
  	}
	

	push @records, {
		day	=> $day,
		month	=> $month,
		year	=> $year,
		hour    => $hours,
		minute	=> $minutes  || 0,    # sometimes no minutes
		second	=> $second   || 0,    # sometimes no seconds
		rain	=> $rain,
	};
}

Using a while loop as opposed to a foreach loop, or sucking the whole array into 
an array will enhance memory efficiency (in most cases).  If you're planning to 
do something with the records as soon as you have the data, then obviously you 
don't need to store that information.

> You make an index counter, but don't really need it. How about:
> Also note that if your match *fails*, those number variables will  
> still contain their last successful match. This is probably bad.

This doesn't always happen, but it can happen and is bad when it does.


>> In terms of the bigger picture, is using $1,$2 etc the best way to  
>> do it?

If you can avoid using $1, $2, $3 etc in favour of your own variable names then 
that's probably a good idea.  For example:

	if ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s*(\d.\d*)|) {
		($day,$month,$year,$rain) = ($1,$2,$3,$4);
		($second,$minute,$hour) = (0,0,0);
	}

could be rewritten:

	if( ($day, $month, $year, $rain) = (
	     $file[$i] =~ m|(\d+)/(\d+)/(\d{4})\s*(\d.\d*)| ) {
		($second,$minute,$hour) = (0,0,0);
	}

merely by capturing the matches in list context.

I still recommend split for this kind of problem.

all the best,

	Jacinta



More information about the Brisbane-pm mailing list