Phoenix.pm: Mystery perl module failure

Robert Lindley bob at brogmoid.com
Sun Jan 18 21:30:08 CST 2004


Here is a puzzle.

I am constructing an assembler for a circa 1979 computer that is on the 
Apache.

Tried to use Text::ParseWords module. It almost worked. I expect it to 
parse out a
quoted token only if the quote immediately follows a word delimiter. I 
need it to work
that way (and the regex looks like it should) but it grabs the whole 
word at the front
and back of the quoted token.

What is really bad is that if there is an unmatched single or double 
quote anywhere
on the line it throws the entire line away by returning an empty array 
of words.

I have extracted the part of Text::ParseWords that I am using and put it 
in a error
demo program that is as small as is needed to show the error. 

Question:

 Does anybody know how to modify the main regex to:
  1. only tokenize a quoted string when that string starts with a single 
or double quote
  2. return all the tokens (including the quote in place) when any 
unmatched quotes are present.

To run, copy both enclosed files somewhere and run with this command:

./parse-error-demo.pl  test.src

I made one change to parse_line -- deleted reference to 
$PERL_SINGLE_QUOTE --
that should not effect this problem.

Does anyone know of another perl module to parse input lines into tokens 
treating
quoted strings as single units by ignoring enclosed delimiters?

Thanks for any help.

Bob Lindley
-------------- next part --------------
#!/usr/bin/perl
#
use strict 'vars';
use warnings;
# use Text::ParseWords;
my($file, $input, $inline, @words1)
;
  $file = shift;
  open IN, $file or die "Can't open $file:\n   $!\n";
  # read all lines in current input file.
  while($inline = <IN>) {
    $inline =~ s/\s+$//; # trim trailing white space
    $inline =~ s/^\s+//; # trim leading white space
    print "|$inline|\n";
    if($inline eq "") { next; }  # Skip blank lines
    @words1 = &parse_line('\s+' , 'delimiters', $inline);
    print join "|", @words1, "\n--------\n";
    # Each item in @words holds:
    #    empty string '' (e.g. word starts in col 1.)
    #    word with only delimiters present
    #    delimited word
    #
  }
  close IN;
  exit;

sub parse_line {
  # We will be testing undef strings
  no warnings;
  use re 'taint'; # if it's tainted, leave it as such

  my($delimiter, $keep, $line) = @_;
  my($quote, $quoted, $unquoted, $delim, $word, @pieces);
  while (length($line)) {
    ($quote, $quoted, undef, $unquoted, $delim, undef) =
      $line =~ m/^(["'])                 # a $quote
      ((?:\\.|(?!\1)[^\\])*)    # and $quoted text
      \1                     # followed by the same quote
      ([\000-\377]*)         # and the rest
      |                       # --OR--
      ^((?:\\.|[^\\"'])*?)    # an $unquoted text
      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
                                               # plus EOL, delimiter, or quote
      ([\000-\377]*)           # the rest
      /x;                      # extended layout
    return() unless( $quote || length($unquoted) || length($delim));
    $line = $+;
    if ($keep) {
      $quoted = "$quote$quoted$quote";
    } else {
      $unquoted =~ s/\\(.)/$1/g;
      if (defined $quote) {
        $quoted =~ s/\\(.)/$1/g if ($quote eq '"');
        $quoted =~ s/\\([\\'])/$1/g if ($quote eq "'");
      }
    }
    $word .= defined $quote ? $quoted : $unquoted;
    if (length($delim)) {
      push(@pieces, $word);
      push(@pieces, $delim) if ($keep eq 'delimiters');
      undef $word;
    }
    if (!length($line)) {
      push(@pieces, $word);
    }
  }
  return(@pieces);
}



__END__
-------------- next part --------------
An ordinary line parses just fine 'this has a space in it.'
Mismatched quotes throw away the whole line "mismatched quotes.'
Dave O'Neil worked with George O'Malley on this project.
My name is David O'Neil 
                        ^ENABLES THE "SSS" MSG'S TO THE DTC,    {57-000
^ NOTE THE '' ABOVE MEANS TO USE ONE ' CHARACTER                {57-002


More information about the Phoenix-pm mailing list