[Brisbane-pm] Regex Syntax
Jacinta Richardson
jarich at perltraining.com.au
Thu Apr 12 18:54:37 PDT 2007
Damian James wrote:
>> Date Rainfall mm
>> 01/01/2000 00:00:00 27.333
>> 01/01/2000 00:05:00 0.0
>> 01/01/2000 02:00:00 29.15
>> 01/01/2000 02:05:30 0.0
>>
>
> Hmm, you could just use split() first on the while space, then on
> the /s and :s
As Damian has said, split() is definitely the way to go. It's going to be much
less error prone and very easy for everyone to understand. My solution is
slightly different from Damian's (untested, but should be close):
open FILE, "<", $name or die $!;
my @records;
while(<FILE>) {
my ($date, $time, $rainfall) = split (/ /, $_);
my ($day, $month, $year) = split("/", $date);
# Sometimes we're not given time at all
my ($hours, $min, $sec) = (0,0,0);
if( !$rainfall ) {
$rainfall = $time;
}
else {
($hours, $min, $sec) = split(":", $time);
}
push @records, {
day => $day,
month => $month,
year => $year,
hour => $hours,
minute => $minutes || 0, # sometimes no minutes
second => $second || 0, # sometimes no seconds
rain => $rain,
};
}
Using a while loop as opposed to a foreach loop, or sucking the whole array into
an array will enhance memory efficiency (in most cases). If you're planning to
do something with the records as soon as you have the data, then obviously you
don't need to store that information.
> You make an index counter, but don't really need it. How about:
> Also note that if your match *fails*, those number variables will
> still contain their last successful match. This is probably bad.
This doesn't always happen, but it can happen and is bad when it does.
>> In terms of the bigger picture, is using $1,$2 etc the best way to
>> do it?
If you can avoid using $1, $2, $3 etc in favour of your own variable names then
that's probably a good idea. For example:
if ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s*(\d.\d*)|) {
($day,$month,$year,$rain) = ($1,$2,$3,$4);
($second,$minute,$hour) = (0,0,0);
}
could be rewritten:
if( ($day, $month, $year, $rain) = (
$file[$i] =~ m|(\d+)/(\d+)/(\d{4})\s*(\d.\d*)| ) {
($second,$minute,$hour) = (0,0,0);
}
merely by capturing the matches in list context.
I still recommend split for this kind of problem.
all the best,
Jacinta
More information about the Brisbane-pm
mailing list