[LA.pm] little help??

Peter Benjamin pete at peterbenjamin.com
Wed Sep 28 18:16:48 PDT 2005


At 04:38 PM 9/28/2005, Robert Spier wrote:
>> I got an off-list reply that thought I was being a little too hard on 
>> you. Didn't really mean to be that way, so sorry if it seemed that way. 
>> Good luck with your perl hacking!
>
>I didn't think you were being hard.  Just detailed.  Most beginners
>would be lucky to get such a detailed and thorough response.


It was a good newbie perl style bootstrap, just what I thought
was being asked for.

I'd tend to use a CPAN module for parsing the log file,
as the split using the default RE does not give good
results when there are quoted strings, quoted due to
embedded blanks, which the CPAN module takes care of.

A split RE is possible to handle the parsing... it's hard to write.

Reducing the log file for testing to just the needed records
would eliminate debugging the parsing.

grep RE logfile > testfile

As I've got RAM I would have done the whole perl differently.
I'm so lazy instead of finding the actual code I use, here
it is real fast (this ought not to compile, so you must fix it).

my $br;
$br = "\n";   # For CLI output
# $br = "<br>\n";   #  For html output
my $fn = $ARGV[0];
open ( $log, "<$fn" ) || die ( "Error: Unable to open input filename: $fn${br}Message: $!$br" );
my @rec = <$log>;
close $log;
my @matches = grep /RE_here/, @rec;
my %seenIP;
foreach my $rec ( @matches ) {
  my ( ..., $ip, .... ) = split( /RE_here/, $rec );
  $seenIP{$ip}++;
}
foreach my $ip ( sort keys %seenIP ) {
  my $count = $seenIP{"$ip"};
  print "$ip\t$count$br";
}
exit;


You can see my standard convention for "open || die" that includes the filename and error message.
And the slurp of the log file is fast.
The grep reduces the troubleshooting of subsequent code.
The %seen method is from an ORA perl book, and pretty standard fair now a days.
Best re-use the %seen method whenever counting matches.  It's fast and QED.
Thus, the two foreach loops ought to be in a newbie's library sooner, not later.

The sort is not quite in numeric order, left as homework for the student.  
Anyone have something that sorts IP numerically?  In one code line?  ;-)



More information about the Losangeles-pm mailing list