SPUG: Regexp::Assemble question

John W. Krahn jwkrahn at shaw.ca
Tue Feb 24 18:08:42 PST 2009


Amit Sett wrote:
> I had written some Perl code to go through thousands of files and match for
> errors contained in a hash shown in brown (see *Old code* section). I

Emails do not have colours, they are just plain text.

> disliked this approach because I didnt want to alter the core program
> everytime I added or removed search strings/expressions. My goal is to read
> the list of strings from a text file and create the same hash using a loop
> as shown in blue (see *New code* section).
> 
> The problem with this approach is that attempts to print the values for any
> keys of this hash are unsuccessful and show up as blanks. I ran it through a
> debugger and saw this instead with the help of Padwalker:
> 
> $erl_hash    HASH(0xa541e88)=...
>     ArithmeticException    CODE(0xa5493a8)
>     ArrayStoreException    CODE(0xa549a88)
> 
> I was expecting to see something more like this:
> $erl_hash    HASH(0xa541e88)=...
>     ArithmeticException    sub {return ArithmeticException}
>     ArrayStoreException    sub {return ArrayStoreException}
> 
> Any help would be appreciated. I can share the entire program if need be.
> 
> Regards,
> Amit
> 
> 
> 
> *Old code*
> # Create the Hash for matching Errors
> my $exception_match = {
>     'ArithmeticException' => sub {return "ArithmeticException"},
>     'ArrayStoreException' => sub {return "ArrayStoreException"},
> };

Is this your actual code?  If so, you don't need the subroutines at all, 
you just need the keys.  And why use a hash reference instead of a hash?

my %exception_match = (
     ArithmeticException => undef,
     ArrayStoreException => undef,
);

Or perhaps:

my %exception_match = map { $_ => undef } qw/ArithmeticException 
ArrayStoreException/;


> #Error Matching Section
> my $match_err = Regexp::Assemble->new( track => 1 )->add( keys
> %$exception_match );
> 
> my %error_counter = ();                            #This hash stores the
> total count of all errors
> 
> foreach my $filename (@logFile_list) {
>     # First Determine if we need to use DBM to handle file parsing
>     my $filesize = stat($filename)->size;

Or just:

     my $filesize = -s $filename;


>     $Logger->debug("File name is $filename\tFile Size is $filesize");
> 
>     # Next Determine if file is gzip compressed
>     my $gzipTest = qx(file $filename);
>     if($gzipTest =~ m/gzip/i) {
>         open(MYINPUTFILE, "<:gzip", "$filename") or $Logger->logdie("Error
> opening file: $!");
>     } else {
>         open(MYINPUTFILE, "<$filename") or $Logger->logdie("Error opening
> file: $!");
>     }

Why all the duplicate code?

     my $mode = qx(file $filename) =~ /gzip/i ? '<:gzip' : '<';
     open MYINPUTFILE, $mode, $filename or $Logger->logdie( "Error 
opening file: $!" );


> # This section sends a message to the log file and STDOUT if an exception is
> caught
>     my $lines = 0;

No need for this variable as Perl provides the $. built-in variable that 
keeps track of the current line number.


>     while( <MYINPUTFILE> )
>     {
>         $lines++;
>         chomp;
>         if( $match_err->match($_) )
>         {
>             my ($exception_name) = $exception_match->{ $match_err->matched
> }( $match_err->mvar() );

You pass $match_err->mvar() to the subroutine but you don't actually use 
it in the subroutine.

If you used my suggestion above then this line would be:

             exists $exception_match{ $match_err->matched } and my 
$exception_name = $match_err->matched;


>             $Logger->debug("Exception was found: $exception_name\tin file:
> $filename at line: $lines");

With the built-in $. variable:

             $Logger->debug( "Exception was found: $exception_name\tin 
file: $filename at line: $." );


>             $error_counter{$filename}{$exception_name}++;
> #Logic for counting the various types of errors
>         }
> 
>     }
> }
> 
> 
> *New code*
> #This section of code reads the err file and saves the key-value pairs in a
> hash for easy retrieval
> my $erl_hash = {};

Why not just use a hash instead of a hash reference?


> my $erl_file = $ini_hash{'error_list_file'};
> 
> open(FILE, "<", $erl_file) or die "Can't open $erl_file: $!\n";
> 
> while (my $erl_file_line = <FILE>)
>     {
>         chomp $erl_file_line;
>         if (($erl_file_line !~ m/^[\s]*#/gi) && ($erl_file_line !~

You are using the ^ beginning of string anchor so the /g option makes no 
sense as there is only one beginning of string.  You are using the /i 
option but the \s character class and the # character are not affected 
by it so it is superfluous.  So that regular expression should be:

$erl_file_line !~ /^\s*#/


> m/^[\s]*$/gi ))

Same here.


>         {
>             $erl_file_line =~ m/^[\s]*([\w\S]+)[\s]*$/gi;

The \w character class is a subset of the \S character class (and the 
comments above also apply) so that regular expression should be:

$erl_file_line =~ /^\s*(\S+)\s*$/;


>             $erl_hash->{$1} = sub {return $1};

You should *never* use the numerical variables unless the pattern 
matched successfully.

if ( $erl_file_line =~ /^\s*(\S+?)\s*$/ ) {
     $erl_hash->{ $1 } = sub { return $1 };
}


>             print "Added key: ",$1,"\tCorresponding value:
> ",$erl_hash->{$1},"\n";
>         }
>     }


John
-- 
Those people who think they know everything are a great
annoyance to those of us who do.        -- Isaac Asimov


More information about the spug-list mailing list