[Chicago-talk] advanceing through a file w/o using an array

Steven Lembark lembark at jeeves.wrkhors.com
Sat Nov 8 14:49:41 CST 2003



-- Christopher Nava <starbuck at hawaii.rr.com>

> Do you have a place I can find more info on this "flipflop" operator.
> I tried google "flipflop + perl" and got nothing usefull.
> I too am doing multi line file parsing (MTF messages not email) and could
> use any tools you've got.

what is the 'flipflop' operator supposed to do?

if you want to detect file transitions try:

	my $path = '';

	while( <ARGV> )
	{
		if( $path ne $ARGV )
		{
			$path = $ARGV;

			# just read the first line of a new file, deal
			# with any setups.

			...
		}

		...
	}

> BTW: My current method is to do something like this....
>
> open (INFILE, $filename) || die ($!);
># I've got LOTS of ram and the files are relativly short so slurping in the
> entire file is no problem.
> my @file = <INFILE>;
> close (INFILE);
>
> chomp @file;

you could use

	chomp( my @linz = <$infile> );

if you are really adverse to typing :-)

Lexical file handles are easier to deal with in most cases:

	sub slurp
	{
		open my $fh, '<', shift
			or croak "Bogus path";

		chomp( my @linz = <$fh> );

		wantarray ? @linz : \@linz
	}


	my $linz = slurp $path;

or

	my @linz = slurp $path;


will do it nicely.


># Note: I usualy use a map function to get rid of both DOS CR/LF and Unix
> LFs.

a for loop handles this with less overhead -- doesn't have to
flatten and reassign the list:

	s/\s+$// for @linz;

is the fastest way.

># I prefer foreach since it results in a named variable ($line) that is
> easier for new persons to understand.
> foreach $line (@file) {

You can save all of the if blocks by using a dispatch table and
one regex:

	my %handlerz =
	(
		# this associates each of the mail header entries
		# with a subroutine that handles that header entry.

		Subject => __PACKAGE__->can( 'subject_handler' ),
		Country => __PACKAGE__->can( 'country_handler' ),
		Summary => __PACKAGE__->can( 'summary_handler' ),
		From	=> __PACKAGE__->can( 'from_handler' ),
	);

	sub read_header
	{
		# read the first paragraph, leaves ARGV positioned
		# at the message body.

		local $/ = '';

		my $header = <ARGV>;

		chomp( my @hdrlinz = split /\n/, $header );

		wantarray ? @hdrlinz : \@hdrlinz;
	}

	for( read_header )
	{
		my ($field) = /^(\w+):/;

		if( my $sub = $handlerz{$fild} )
		{
			$sub->( $_ );
		}
	}

	# the next <ARGV> will suck up the message body -- assuming
	# an RFC mail message, Hell alone knows what you'll get from
	# Domino or Outlook...


To add a handler for some field type just write a sub in the
current module or use base wherever you put the thing. The
%handlerz table associates the mail header entry with whichever
sub is used to handle it. Nice thing is that you can associate
any sub with any number of headers (e.g., a log-only entry to
record what happend) and add the pieces one-by-one as you need
them (or stub the ones you need and develop them as you go
along).

I know this verges on the OO-ish for some people, but if you
think of it as a jump table it may be easier to swallow :-)


>
>    # Note, I try to move the most likely items toward the top to avoid
> testing oddballs for every line...
>     if ($header && $line =~ /^sometext$/) { # Look for the end of the
> header         $header = 0;
>     }elsif ($line =~ /^Subj:/) { # Multi line item
>         $found = "Subject";
>     }elsif ($line =~ /^Country:/) { # Single line item
>         $found = "Country";
>     }elseif ($line =~ /^Summary:/) { # Multi line item
>         $found = "Summary";
>     } #....more of same...
>
>     # If $found is equal to an item we append it to that metadata.
>     if ($header) {
>         $metadata{$found} =. $line;
>     } else {
>         push (@body, $line);
>     }
> }



--
Steven Lembark                               2930 W. Palmer
Workhorse Computing                       Chicago, IL 60647
                                            +1 888 359 3508



More information about the Chicago-talk mailing list