[Omaha.pm] Another 10m ad-hoc report

Jay Hannah jhannah at omnihotels.com
Fri Jul 14 07:29:40 PDT 2006


I love that it takes longer to explain what I'm doing and why then to
actually do it in Perl. :)

The Swiss army chainsaw of text processing, baby. :)

j


Project:

Given a file that looks like this:

2006-07-14 09:12:59|97036502|NYCBER|GNRSPE|1170245141
2006-07-14 09:12:59|97036503|CRPBFT|GNRSPE|1450000001
   CRPBFT|GNRSPE|1450000001|L||2007173547||DMC|2006-07-14
09:17:08.27300|0|0|PROCRPBFTACT-2007173547ITN-6COD-12PMFRD-2006071400000
0TOD-20060716000000AMT-0STA-A

1) Ignore all lines that don't start with "2006"
2) Ignore all lines that don't contain "GRMSTR"
3) In the remaining lines:
   Column 1 (counting from 0) is "prop".
   Column 4 (counting from 0) is "message_grp".
   Per prop, tell me the number of lines, and the number of unique
message_grp's.


Solution:

$ cat j.pl

while (<>) {
   next unless (/^2006/);
   next unless (/GRMSTR/);
   @l = split /\|/;
   $count{$l[2]}{keys}{$l[4]} = 1;
   $count{$l[2]}{lines}++;
}

foreach $prop (sort keys %count) {
   my $lines = $count{$prop}{lines};
   my $keys  = scalar(keys %{$count{$prop}{keys}});
   print "$prop sent $lines GRMSTR records containing $keys unique
message_grp's\n";
}


Result:

$ cat libqumv.log | perl j.pl
ATLCNN sent 37 GRMSTR records containing 37 unique message_grp's
AUSCTR sent 28 GRMSTR records containing 28 unique message_grp's
...etc...


More information about the Omaha-pm mailing list