SPUG: multiline matching question

jerry gay jerry.gay at gmail.com
Thu Sep 8 16:50:44 PDT 2005


On 9/8/05, Richard Wood <wildwood_players at yahoo.com> wrote:
> SPUGsters,
> 
> I am trying to efficiently grab sets of data from
> multiple lines of a file and print the collected data
> out on one line per set.  I've never tried multi line
> matching before ( I usually just write a program that
> tests which kind of line I am processing).  But I
> believe that this can be done in one match statement.
> So I would like some help.  Currently, in the test
> file that I am including, I am getting the data from
> the second set of lines which seems pretty unusual to
> me.  I'm looking forward to understanding what I am
> doing wrong.
> 
don't worry, it *can* be done :)

> The file consists of 150 character records.  Every
> third record starts a new set of data.  I need bits
> and pieces from records 1 & 2 of each set.
> 
by 'record' here, i assume you mean 'line in a file'.

<snip perl code> 

> I am attaching the sample.txt file, which has 4 sets
> of records.  When I run the command, as I say, I get
> the second of the 4 sets of data:
> 
from what i can tell, you do mean a 'record' is three lines long. from
what you've attached...

> AP01     AZ  468                                      71-70-0      0000            100
> ENGINE DRAINS
> 
> AP01     AZ  468                                      71-70-0      0000            100
> ENGINE DRAINS                                                     CC  135
> 
> AP01     AZ  468                                      71-70-0      LIN 500,        100X
> ENGINE DRAINS
> 
> AP01     AZ  468                                      71-70-0      LIN 548+        100XA
> ENGINE DRAINS
> 
it seems the third line is always blank. that, to me, means that you
have two-line records, and the record separator is "\n\n" (which looks
just like a blank line).

if you consult 'perldoc perlrun', you'll find the '-0' option, which
allows you to specify the input record seperator. there's a special
variation of this flag, '-00' (note these are zeroes, not letter Os,)
which sets perl to slurp in 'paragraph mode' which means it will
create a new record every time it sees two or more newlines in a row.

since you want to print the fields as space separated, you can set the
output field seperator, '$,'
    $,= ' ';

there's still a newline seperating lines in your record, though, the
one preceding 'ENGINE DRAINS' in your example data. this can be
replaced with whitespace using
    y/\n/ /

assuming each record each has seven fields, and only the last field
allows ' ' (space) as a valid field character, you can use a special
form of split to ...split... the fields on whitespace. also, if you
limit the split to seven fields, you'll have only that many, even if
the last field has space characters in it.
    my @a= split ' ' => $_, 7;

since you don't want to print the first field, you can print a slice
of the field array you created:
    print @a[1..6];

putting it all together, and using '-n' and '-l', which i'll leave for
you to look up if you desire, you get:
  perl -n00le "$,=' '; y/\n/ /; my @a= split ' ', $_, 7; print
@a[1..6]" sample.txt


hope that helps.
~jerry


More information about the spug-list mailing list