[Za-pm] Script
Nick Cleaton
nick at cleaton.net
Wed Jun 4 06:50:36 CDT 2003
On Wed, Jun 04, 2003 at 08:33:33AM +0200, Bartho Saaiman wrote:
> I want to filter a log file that I currently have to manipulate
> manually. So I was thinking to myself that this would be nice if I could
> do this with Perl. If it would be easier in bash, suggestions would also
> be welcome. So here is the scenario:
>
> Output of log file (sms.log):
> <snip>
> [Fri May 23 11:02:12 SAST 2003] SMS user at domain.co.za sent 43
> characters to 271234567891
> [Fri May 23 18:16:02 SAST 2003] SMS "Some User" <user at domain.co.za>
> sent 150 characters to 271234567891
> [Sat May 24 12:51:37 SAST 2003] SMS "Some User" <user at domain.co.za>
> sent 151 characters to 271234567891
> [Mon May 26 15:16:00 SAST 2003] SMS Some User <user at domain.co.za> sent
> 142 characters to 271234567891
> </snip>
>
> So the first problem is that the user (Some User) detail is logged in
> three different ways. I am also only interested in the email addres as I
> can use this to do accountting with. I am currentl using bash like this:
>
> [bartho at hercules bartho]$ cat smslog |grep "May"| grep "2003" |awk \
> '{print $8, $9, $10}'
> user at domain.co.za sent 47
> "Some User" <user at domain.co.za>
> Some User <user at domain.co.za>
>
> Now this is where my problem starts. I probably need to use regular
> expressions to feed it the month and the domain. The year I could
> probably use in a regex too, but this doesn't change to often. Then I
> ned to send this to a clean file only containing the emails that this
> originated from. I do not need to sort them as unique since I have to
> add them up, similar to 'wc -l'
As I understand it, you want just the email addresses, so if the input
is
"Some User" <user at domain.co.za>
then you want just the user at domain.co.za part. Is that right ?
Here's the script I might write to solve that problem:
<snip>
#!/usr/bin/perl -w
use strict;
=head1 NAME
get_email.pl - quick script to extract email addresses from SMS log
=head1 SYNOPSIS
get_email.pl [smslog ...]
=head1 DESCRIPTION
Parses SMS log files to extract a list of user email addresses, and
prints them to STDOUT one per line.
=cut
# A regular expression to match a reasonable looking email address
my $email = qr#[\w\-\.]+\@[\w\-\.]+#;
while(<>) {
chomp;
unless ( /^\[([\w\s:]+)\] SMS\s+(.+?)\s+sent \d+ characters to \d+\s*$/ ) {
warn "Can't parse log line [$_]\n";
next;
}
my ($date, $addr) = ($1, $2);
# We're only interested in May
next unless $date =~ / May /;
if ( $addr =~ /<($email)>$/ or $addr =~ /^($email)$/ ) {
print "$1\n";
}
else {
warn "Can't parse address [$addr] at log line [$_]\n";
}
}
</snip>
--
Nick
More information about the Za-pm
mailing list