[sf-perl] which first: remove non-data lines or process line continuations?

Mark Kvale kvale at phy.ucsf.edu
Tue Dec 11 12:48:32 PST 2007


My opinion is that comments should be able to be removed completely without
changing the meaning or structure of data. If people are creating multi-line
comments by using continuations, rather than comment lines all starting with #,
that's evil.

So I'd take out all comments first and then process the continuations.

Best of all is to get the file format of your data files from the source that
generated it.

Mark


David Alban wrote:
> greetings,
> 
> I'm parsing text files.  in these text files, a data line is any line
> except a comment line, blank line, or null line.  I ignore all lines
> that are not data lines.  That is, I process only lines not matching:
> 
>   m{ \A \s* (?: \# | \z ) }xms
> 
> I also allow line continuation.  That is, backslash-newline pairs are
> deleted (after backslash-quoted backslashes are "protected").
> 
> So I have a choice.  I can process removal of non-data lines first.
> Or I can process line continuations first.
> 
> Take the following set of lines:
> 
>     foo \
>     # : bar \
>     : bat
>     mumble \
>     : squeak
> 
> If I process line continuations first, my data lines become:
> 
>     ( 'foo # : bar : bat', 'mumble : squeak' )
> 
> If I process removal of non-data lines first, I get:
> 
>     ( 'foo : bat', 'mumble, squeak' )
> 
> I'm leaning toward removing the non-data lines first.  But I wanted to
> see if anyone had any strong opinions or otherwise interesting
> observations.
> 
> Thanks,
> David



More information about the SanFrancisco-pm mailing list