[Chicago-talk] advanceing through a file w/o using an array
Steven Lembark
lembark at jeeves.wrkhors.com
Sat Nov 8 14:49:41 CST 2003
-- Christopher Nava <starbuck at hawaii.rr.com>
> Do you have a place I can find more info on this "flipflop" operator.
> I tried google "flipflop + perl" and got nothing usefull.
> I too am doing multi line file parsing (MTF messages not email) and could
> use any tools you've got.
what is the 'flipflop' operator supposed to do?
if you want to detect file transitions try:
my $path = '';
while( <ARGV> )
{
if( $path ne $ARGV )
{
$path = $ARGV;
# just read the first line of a new file, deal
# with any setups.
...
}
...
}
> BTW: My current method is to do something like this....
>
> open (INFILE, $filename) || die ($!);
># I've got LOTS of ram and the files are relativly short so slurping in the
> entire file is no problem.
> my @file = <INFILE>;
> close (INFILE);
>
> chomp @file;
you could use
chomp( my @linz = <$infile> );
if you are really adverse to typing :-)
Lexical file handles are easier to deal with in most cases:
sub slurp
{
open my $fh, '<', shift
or croak "Bogus path";
chomp( my @linz = <$fh> );
wantarray ? @linz : \@linz
}
my $linz = slurp $path;
or
my @linz = slurp $path;
will do it nicely.
># Note: I usualy use a map function to get rid of both DOS CR/LF and Unix
> LFs.
a for loop handles this with less overhead -- doesn't have to
flatten and reassign the list:
s/\s+$// for @linz;
is the fastest way.
># I prefer foreach since it results in a named variable ($line) that is
> easier for new persons to understand.
> foreach $line (@file) {
You can save all of the if blocks by using a dispatch table and
one regex:
my %handlerz =
(
# this associates each of the mail header entries
# with a subroutine that handles that header entry.
Subject => __PACKAGE__->can( 'subject_handler' ),
Country => __PACKAGE__->can( 'country_handler' ),
Summary => __PACKAGE__->can( 'summary_handler' ),
From => __PACKAGE__->can( 'from_handler' ),
);
sub read_header
{
# read the first paragraph, leaves ARGV positioned
# at the message body.
local $/ = '';
my $header = <ARGV>;
chomp( my @hdrlinz = split /\n/, $header );
wantarray ? @hdrlinz : \@hdrlinz;
}
for( read_header )
{
my ($field) = /^(\w+):/;
if( my $sub = $handlerz{$fild} )
{
$sub->( $_ );
}
}
# the next <ARGV> will suck up the message body -- assuming
# an RFC mail message, Hell alone knows what you'll get from
# Domino or Outlook...
To add a handler for some field type just write a sub in the
current module or use base wherever you put the thing. The
%handlerz table associates the mail header entry with whichever
sub is used to handle it. Nice thing is that you can associate
any sub with any number of headers (e.g., a log-only entry to
record what happend) and add the pieces one-by-one as you need
them (or stub the ones you need and develop them as you go
along).
I know this verges on the OO-ish for some people, but if you
think of it as a jump table it may be easier to swallow :-)
>
> # Note, I try to move the most likely items toward the top to avoid
> testing oddballs for every line...
> if ($header && $line =~ /^sometext$/) { # Look for the end of the
> header $header = 0;
> }elsif ($line =~ /^Subj:/) { # Multi line item
> $found = "Subject";
> }elsif ($line =~ /^Country:/) { # Single line item
> $found = "Country";
> }elseif ($line =~ /^Summary:/) { # Multi line item
> $found = "Summary";
> } #....more of same...
>
> # If $found is equal to an item we append it to that metadata.
> if ($header) {
> $metadata{$found} =. $line;
> } else {
> push (@body, $line);
> }
> }
--
Steven Lembark 2930 W. Palmer
Workhorse Computing Chicago, IL 60647
+1 888 359 3508
More information about the Chicago-talk
mailing list