[Omaha.pm] AI-type string comparisons using Perl...

Miller, Scott L (Omaha Networks) scott.l.miller at hp.com
Thu Aug 12 17:57:39 CDT 2004


You think that's ugly?  Try parsing through a syslog file that has various Cisco routers, Cisco Catalyst switches, Nortel BN routers, Nortel Passport switches, foundry gear, and old Centillion (now Nortel) switches all reporting into the same file.  And then you want to try to identify the "important" events...  

The best solution I've created is to first split each and every line into a generic array, then depending on what exactly you were looking in any particular case, you've got both the original line still stored in $_, and the various fields broken by spaces.  So, for instance, here is a bit
of sample code from that code I described above...

sub GuessBoxType {
   #We've got the whole line in $_, that line split by spaces in @a
   #the IP address (better have anyway), and maybe a name...

   #A "message repeated" message doesn't give enough info to guess...
   if ( /last message repeated/ ) { #ignore
      return "";
   } elsif ( /previous event / ) {  #ignore
      return "";
   }

   if ( /ENTITY\/EVENT CODE:/ ) {  #Nortel RS router
      #nortel BN router log msgs include the string "ENTITY/EVENT CODE:"
      return "nortel-rtr";
   }

   if ( /[0-9]+\/[0-9]+ [0-9]+ [\w\-]+:/ ) { #Cisco CSS
      #matching on the "slot/port msg# subSystemName-level:" portion of logmsg
      return "cisco-css";
   }

   if ( $a[4] =~ /^CPU[0-9]+$/ ) { #Passport 8600 switch
      #Each message includes which CPU is reporting the issue.
      return "passport-8600";
   }

   if ( substr($a[4],-1,1) eq ',' ) { #Foundry Server Iron
      #Each message starts with the "boxname" as known by the serveriron itself
      #followed by a comma ','.
      return "foundry-svrirn";
   }

   if ( /\[[0-9]+ ([0-9]{2}:){3}[0-9]{3}\]/ ) { #Excellar
      #matching on "[msg# hour:min:sec:msec]" portion of log message
      return "accelar";
   }
...

So, depending on what device I'm attempting to identify, I'll use either a straight regex, or some portions of the generic array @a, which is created with the following line near the top of my while(<>) loop:
  @a=split(/\s+/);

(The above isn't usable out of the box, BTW, @a is modified some before 'Guess_box_type' gets called...)

Although I'm not entirely clear what you're trying to do, my gut reaction is that you don't want to grep for various things and then later try to put it all together.  If at all possible, just process the file one line at a time.

If you'd like to provide more information about what you're trying to do, I, and probably others, would be happy to help find the best way to accomplish the task.

-Scott 

PS. If anyone is actually interested in my syslog processing code, I'd be happy to oblige...



-----Original Message-----
From: omaha-pm-bounces at mail.pm.org
[mailto:omaha-pm-bounces at mail.pm.org]On Behalf Of Daniel Linder
Sent: Thursday, August 12, 2004 10:12 AM
To: omaha-pm at mail.pm.org
Subject: [Omaha.pm] AI-type string comparisons using Perl...


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everyone!

  I am working on a small script that will compare some text output from a
program and parse out the numbers I want to act upon.  Unfortunatly the
output from the script is quite "humanized" and really ugly to parse
using RegExp.
  Does anyone know of a perl module/function that can take a template
string and use that to extract the values into variables for further
use?

Here is how I am starting to work on this:
 1  @OUTPUT = `/path/to/ovstatus`;
 2  #--begin output---
 3  #  object manager name: netmon
 4  #...snip...
 5  # 14:30:00 Polling 0 interfaces, 0 polls/hour.  0 overdue polls,
current maximum 0 [...etc...] average 0.2 msec/lookup.
 6  #--end output---
 7  ($LINE) = grep /overdue.polls/i, @OUTPUT;
 8  $RAWRATE =~ /Polling.*, (\d*) ([^\s]*)\. /i;
 9  $RATE = $1;
10  $RATEUNITS = $2;
...and so on...

Line 7 finds the single line that I want to work with and puts that into
$LINE.
I have to repeat lines 8-10 for all the values I want to pull out of that
line.  I started with a single large regexp that was a nightmare to debug
if I got off in my RegExp syntax. :(

Anyone got other ideas?  I will be using this method for many other
programs on the system I want to monitor so the more flexible the parsing
routine the better.

Dan

- - - - -
"I do not fear computer,
I fear the lack of them."
 -- Isaac Asimov

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBG4jcNiBNyqUzGb8RAlE9AJwO6dFCgsPy4lfIhYvAtER2DcI/SACZAf5R
RU8GrLUCrHhchmF53ASntm0=
=EJd3
-----END PGP SIGNATURE-----
_______________________________________________
Omaha-pm mailing list
Omaha-pm at mail.pm.org
http://www.pm.org/mailman/listinfo/omaha-pm



More information about the Omaha-pm mailing list