SPUG: Regexp::Assemble question
John W. Krahn
jwkrahn at shaw.ca
Tue Feb 24 18:08:42 PST 2009
Amit Sett wrote:
> I had written some Perl code to go through thousands of files and match for
> errors contained in a hash shown in brown (see *Old code* section). I
Emails do not have colours, they are just plain text.
> disliked this approach because I didnt want to alter the core program
> everytime I added or removed search strings/expressions. My goal is to read
> the list of strings from a text file and create the same hash using a loop
> as shown in blue (see *New code* section).
>
> The problem with this approach is that attempts to print the values for any
> keys of this hash are unsuccessful and show up as blanks. I ran it through a
> debugger and saw this instead with the help of Padwalker:
>
> $erl_hash HASH(0xa541e88)=...
> ArithmeticException CODE(0xa5493a8)
> ArrayStoreException CODE(0xa549a88)
>
> I was expecting to see something more like this:
> $erl_hash HASH(0xa541e88)=...
> ArithmeticException sub {return ArithmeticException}
> ArrayStoreException sub {return ArrayStoreException}
>
> Any help would be appreciated. I can share the entire program if need be.
>
> Regards,
> Amit
>
>
>
> *Old code*
> # Create the Hash for matching Errors
> my $exception_match = {
> 'ArithmeticException' => sub {return "ArithmeticException"},
> 'ArrayStoreException' => sub {return "ArrayStoreException"},
> };
Is this your actual code? If so, you don't need the subroutines at all,
you just need the keys. And why use a hash reference instead of a hash?
my %exception_match = (
ArithmeticException => undef,
ArrayStoreException => undef,
);
Or perhaps:
my %exception_match = map { $_ => undef } qw/ArithmeticException
ArrayStoreException/;
> #Error Matching Section
> my $match_err = Regexp::Assemble->new( track => 1 )->add( keys
> %$exception_match );
>
> my %error_counter = (); #This hash stores the
> total count of all errors
>
> foreach my $filename (@logFile_list) {
> # First Determine if we need to use DBM to handle file parsing
> my $filesize = stat($filename)->size;
Or just:
my $filesize = -s $filename;
> $Logger->debug("File name is $filename\tFile Size is $filesize");
>
> # Next Determine if file is gzip compressed
> my $gzipTest = qx(file $filename);
> if($gzipTest =~ m/gzip/i) {
> open(MYINPUTFILE, "<:gzip", "$filename") or $Logger->logdie("Error
> opening file: $!");
> } else {
> open(MYINPUTFILE, "<$filename") or $Logger->logdie("Error opening
> file: $!");
> }
Why all the duplicate code?
my $mode = qx(file $filename) =~ /gzip/i ? '<:gzip' : '<';
open MYINPUTFILE, $mode, $filename or $Logger->logdie( "Error
opening file: $!" );
> # This section sends a message to the log file and STDOUT if an exception is
> caught
> my $lines = 0;
No need for this variable as Perl provides the $. built-in variable that
keeps track of the current line number.
> while( <MYINPUTFILE> )
> {
> $lines++;
> chomp;
> if( $match_err->match($_) )
> {
> my ($exception_name) = $exception_match->{ $match_err->matched
> }( $match_err->mvar() );
You pass $match_err->mvar() to the subroutine but you don't actually use
it in the subroutine.
If you used my suggestion above then this line would be:
exists $exception_match{ $match_err->matched } and my
$exception_name = $match_err->matched;
> $Logger->debug("Exception was found: $exception_name\tin file:
> $filename at line: $lines");
With the built-in $. variable:
$Logger->debug( "Exception was found: $exception_name\tin
file: $filename at line: $." );
> $error_counter{$filename}{$exception_name}++;
> #Logic for counting the various types of errors
> }
>
> }
> }
>
>
> *New code*
> #This section of code reads the err file and saves the key-value pairs in a
> hash for easy retrieval
> my $erl_hash = {};
Why not just use a hash instead of a hash reference?
> my $erl_file = $ini_hash{'error_list_file'};
>
> open(FILE, "<", $erl_file) or die "Can't open $erl_file: $!\n";
>
> while (my $erl_file_line = <FILE>)
> {
> chomp $erl_file_line;
> if (($erl_file_line !~ m/^[\s]*#/gi) && ($erl_file_line !~
You are using the ^ beginning of string anchor so the /g option makes no
sense as there is only one beginning of string. You are using the /i
option but the \s character class and the # character are not affected
by it so it is superfluous. So that regular expression should be:
$erl_file_line !~ /^\s*#/
> m/^[\s]*$/gi ))
Same here.
> {
> $erl_file_line =~ m/^[\s]*([\w\S]+)[\s]*$/gi;
The \w character class is a subset of the \S character class (and the
comments above also apply) so that regular expression should be:
$erl_file_line =~ /^\s*(\S+)\s*$/;
> $erl_hash->{$1} = sub {return $1};
You should *never* use the numerical variables unless the pattern
matched successfully.
if ( $erl_file_line =~ /^\s*(\S+?)\s*$/ ) {
$erl_hash->{ $1 } = sub { return $1 };
}
> print "Added key: ",$1,"\tCorresponding value:
> ",$erl_hash->{$1},"\n";
> }
> }
John
--
Those people who think they know everything are a great
annoyance to those of us who do. -- Isaac Asimov
More information about the spug-list
mailing list