[Chicago-talk] Perl Style

Steven Lembark lembark at wrkhors.com
Thu Aug 19 12:54:02 CDT 2004


> I inject this here, only because its taken many, many rereads of that
> section and I still have the sinking feeling I've not got it right yet.
> I'm closer, maybe but ...

s&m in perl are about bondage to newlines. Sometimes you grab
a string with them and want to match a string with newlines
embedded in them:

	:SQL
		select foo
		from bar
		where bletch = 'blort'
	:COMMENT
		This is a comment with
		multiple lines embedded in it.
	:UNPACK
	:PRINT Heading:
	:COMMENT
	Finito

This is a snippet of DML that we use here for bulk data
handling. If I wanted to grab the comment I could use:

	m{:COMMENT(.+)(?=\n:\w+)}s

to find a ":COMMENT" and grab whatever followed it (.+)
up to the next newline-colon-word. The 's' allows '.' to
match a newline (which it normally doesn't).

One common use of these is with global searches to grab
embedded strings in free-form text read in slurp mode.
In that case it'll usually be m{before(.+)after}gs so
that the '.' can span newlines.

The 'm' option is useful when you know the text is NOT
freeform: it's line based (as in the example above).
With 'm' the ^ and $ operators will bind within the
string wherever there's a newline. In this case the
DML above is parsed here via:

	my @tokens = split /^:(\w+)/, $program_string;

with the '^' binding after each newline in the string
to find however many lines beginning with a colon+word
there are in the string. For the example above this
gives:

(
	SQL
	select ...
	COMMENT
	This ...
	UNPACK
	''
	PRINT
	Heading:
	COMMENT
	Finito
)

(sans most of the quotes). Since UNPACK didn't have any
text following it split leaves me with an empty string.

Point is that 'm' option makes this easier to parse with
the '^' sliding along the newlines. Without it I'd have to
find a regex that handled the first line and rest of them
via optional newlines, etc. Blech...


-- 
Steven Lembark                           9 Music Square South, Box 344
Workhorse Computing                                Nashville, TN 37203
lembark at wrkhors.com                                     1 888 359 3508


More information about the Chicago-talk mailing list