[Za-pm] Script

Nick Cleaton nick at cleaton.net
Wed Jun 4 06:50:36 CDT 2003


On Wed, Jun 04, 2003 at 08:33:33AM +0200, Bartho Saaiman wrote:
> I want to filter a log file that I currently have to manipulate
> manually. So I was thinking to myself that this would be nice if I could
> do this with Perl. If it would be easier in bash, suggestions would also
> be welcome. So here is the scenario:
> 
> Output of log file (sms.log):
> <snip>
> [Fri May 23 11:02:12 SAST 2003] SMS  user at domain.co.za sent 43
> characters to 271234567891
> [Fri May 23 18:16:02 SAST 2003] SMS  "Some User" <user at domain.co.za>
> sent 150 characters to 271234567891
> [Sat May 24 12:51:37 SAST 2003] SMS  "Some User" <user at domain.co.za>
> sent 151 characters to 271234567891
> [Mon May 26 15:16:00 SAST 2003] SMS  Some User <user at domain.co.za> sent
> 142 characters to 271234567891
> </snip>
> 
> So the first problem is that the user (Some User) detail is logged in
> three different ways. I am also only interested in the email addres as I
> can use this to do accountting with. I am currentl using bash like this:
> 
> [bartho at hercules bartho]$ cat  smslog |grep "May"| grep "2003" |awk \
> 	'{print $8, $9, $10}'
> user at domain.co.za sent 47
> "Some User" <user at domain.co.za>
> Some User <user at domain.co.za>
> 
> Now this is where my problem starts. I probably need to use regular
> expressions to feed it the month and the domain. The year I could
> probably use in a regex too, but this doesn't change to often. Then I
> ned to send this to a clean file only containing the emails that this
> originated from. I do not need to sort them as unique since I have to
> add them up, similar to 'wc -l'

As I understand it, you want just the email addresses, so if the input
is 

  "Some User" <user at domain.co.za>

then you want just the user at domain.co.za part.  Is that right ?

Here's the script I might write to solve that problem:

<snip>
#!/usr/bin/perl -w
use strict;

=head1 NAME

get_email.pl - quick script to extract email addresses from SMS log

=head1 SYNOPSIS

  get_email.pl [smslog ...]

=head1 DESCRIPTION

Parses SMS log files to extract a list of user email addresses, and
prints them to STDOUT one per line.

=cut

# A regular expression to match a reasonable looking email address
my $email = qr#[\w\-\.]+\@[\w\-\.]+#;

while(<>) {
    chomp;

    unless ( /^\[([\w\s:]+)\] SMS\s+(.+?)\s+sent \d+ characters to \d+\s*$/ ) {
        warn "Can't parse log line [$_]\n";
        next;
    }
    my ($date, $addr) = ($1, $2);

    # We're only interested in May
    next unless $date =~ / May /;

    if ( $addr =~ /<($email)>$/ or $addr =~ /^($email)$/ ) {
        print "$1\n";
    }
    else {
        warn "Can't parse address [$addr] at log line [$_]\n";
    }
}

</snip>

--
Nick



More information about the Za-pm mailing list