[Brisbane-pm] Regex Syntax
Damian James
djames at thehub.com.au
Thu Apr 12 15:01:43 PDT 2007
On 12/04/2007, at 6:07 PM, Martin Jacobs wrote:
> Hi folks,
>
> I've got a clumsy way to do regexes, and I'm looking for a better way.
>
> I need to read the data from a file (called $name), which has the
> following format...
>
> Date Rainfall mm
> 01/01/2000 00:00:00 27.333
> 01/01/2000 00:05:00 0.0
> 01/01/2000 02:00:00 29.15
> 01/01/2000 02:05:30 0.0
>
Hmm, you could just use split() first on the while space, then on
the /s and :s
> Which cuts of whatever newline character there is on each line,
> followed by
>
> for $i (1..$#file){
> $file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d.
> \d*)|) {
> ($day,$month,$year,$hour,$minute,$second,$rain) = ($1,$2,$3,$4,$5,
> $6,$7);
> }
> }
You make an index counter, but don't really need it. How about:
Also note that if your match *fails*, those number variables will
still contain their last successful match. This is probably bad.
for my $line ( @file ) {
my @values = m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d.
\d*)|);
}
> To accommodate some variations in the input record, I have expanded
> this to
>
> for $i (1..$#file){
> if ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+):(\d+)\s*(\d.
> \d*)|) {
> ($day,$month,$year,$hour,$minute,$second,$rain) = ($1,$2,$3,$4,$5,
> $6,$7);
> }
> elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+):(\d+)\s*(\d.
> \d*)|) {
> ($day,$month,$year,$hour,$minute,$rain) = ($1,$2,$3,$4,$5,$6);
> $second = 0;}
> elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s+(\d+)\s*(\d.\d*)|) {
> ($day,$month,$year,$hour,$rain) = ($1,$2,$3,$4,$5);
> ($second,$minute) = (0,0);
> }
> elsif ($file[$i] =~ m|(\d+)\/(\d+)\/(\d{4})\s*(\d.\d*)|) {
> ($day,$month,$year,$rain) = ($1,$2,$3,$4);
> ($second,$minute,$hour) = (0,0,0);
> }
> else {print_to [$Screen,$summary], "
> PERRMOSS cannot read rainfall data file $name near line $i
> PERRMOSS aborted at: \t\t$fulltime\n\n";
> exit;}
> }
>
Testing pattern matches is good - since you never assign values from
number variables unless there's really been a match.
>
>
> The problem is that the first value of $rain should equal 27.333,
> but it equals 27. So, there's a syntax issue, and i would be
> grateful for any hints.
>
It's the pattern. To match that part of the string, you have \d.\d*
This matches: 1 digit, followed by one "any character", followed by
zero or more digits. 27.333 does not match this, but 27 does.
You probably wanted \d+\.?\d* or somthing like it.
> In terms of the bigger picture, is using $1,$2 etc the best way to
> do it?
I suggest never to use them unless you have tested for a successful
match - though you are doing that here (just getting an incorrect
match) Directly assignng the matches like I have done is okay,
because it doesn't return values unless there was a match. Of course,
you then need to test whether you got the values you were expecting :)
In your case though I'd just use split()
my ($date, $time, $rain) = split '\w+', $line;
my ($year, $month, $day = split '/', $date;
my ($hour, $minute, $second) = split ':', $time;
Though I'd use a hash to hold al those names - then push a referece
to it onto an array:
open FILE, $name or die $!;
my @records;
for my $line (<FLIE>) {
my %temp;
$temp{ qw/ date time rain /) = split '\w+', $line;
$temp{ qw/ year month day /) = split '/', $line;
$temp{ qw/ hour minute second /) = split ':', $line;
push $records, \%temp
}
close FILE or die $!;
Then for any given record $i, $records[$i]->{rain} has the $rain value.
Cheers,
Damian
More information about the Brisbane-pm
mailing list