Phoenix.pm: Mystery perl module failure

Scott Walters scott at illogics.org
Fri Jan 23 18:33:47 CST 2004


Hi Robert,

I'm glad! 

This idiom is very powerful. I wrote a module to parse HTML forms and
repopulate them from data in a hash using that idiom, but someone 
beat me to the punch and got theirs on CPAN before I got around to 
posting mine. Thiers is probably better anyway.  

Re: the locking up when there is no newline at the end of a file, 
if the tokens are matched in patterns chained together with 
if/elsif/elsif, it is easy to do an error condition. If nothing
else matches and it gets to the final elsif in the chain, that
test to see if there is any character there at all. If so, throw a
syntax error, otherwise, you're done.

TinyWiki does exactly that. I wouldn't mind seeing your assembler
as a presentation, but you'd better be ready to tell me all about
the processor you're assembling code for ;)

-scott

On  0, Robert Lindley <bob at brogmoid.com> wrote:
> 
> Scott
> 
> Thanks!
> 
> That method allowed me to solve my problem. Had to modify the code a 
> bit. It had two problems.
> 
> 1. If a line (like the last line of a file) did not have a \n or space at
> the end it went into an infinite loop. Easy to fix. Just made sure the all
> lines had \n at the end.
> 
> 2. Quoted string of the form ' ',,'x' etc. got in troulble. Just made the
> quoted capture non-greedy.
> 
> Ran 50,000+ lines of assembly code through it an no problems found.
> 
> Thanks again - - really saved the day!
> 
> If anybody is interested, I will post what I ended up with.
> 
> Bob Lindley
> 
> Scott Walters wrote:
> 
> >Text::Balanced.
> >
> >Okey, it took me three hours to type that. Sorry if this reply is short.
> >Post a status later...
> >
> >That will get things quoted by an arbitrary character of set of matching
> >characters. If you know when you're expecting something quoted and when
> >you're expecting something, you should be able to mix those.
> >
> >Parse::RecDescent is much more powerful, and the power carries a price.
> >There is some learning involved.
> >
> >I usually just hack up a quick parsing using the \G trick from perldoc
> >perlre. \G in a regex matches where the last match left off, so you
> >can match...
> >
> >  while(1) {
> >    if($str =~ m/\G(['"])(.*)\1\s+/gcs) {
> >      # $1 will contain the qouting character
> >      # $2 contains what was between them
> >      # \s+ eats up whitespace
> >      # the /gc are needed for \G to work
> >    } 
> >    if($str =~ m/\G(.*?)\s+/gcs) {
> >      # $1 is the word 
> >      # the match is non-greedy so that it will stop at the first white-space
> >    }
> >    pos($str) == length($str) and last;
> >  }
> >
> >This should match a stream of things like:
> >
> >not-quoted "quoted stuff" thingie stuff
> >'another quoted thingie'
> >
> >and find:
> >
> >non-quoted
> >"quoted stuff"
> >thingie
> >stuff
> >'another quoted thingie'
> >
> >Hope this helps!
> >-scott
> >
> >On  0, Robert Lindley <bob at brogmoid.com> wrote:
> >  
> >
> >>This is a multi-part message in MIME format.
> >>--------------020401030200020407060505
> >>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >>Content-Transfer-Encoding: 7bit
> >>
> >>Here is a puzzle.
> >>
> >>I am constructing an assembler for a circa 1979 computer that is on the 
> >>Apache.
> >>
> >>Tried to use Text::ParseWords module. It almost worked. I expect it to 
> >>parse out a
> >>quoted token only if the quote immediately follows a word delimiter. I 
> >>need it to work
> >>that way (and the regex looks like it should) but it grabs the whole 
> >>word at the front
> >>and back of the quoted token.
> >>
> >>What is really bad is that if there is an unmatched single or double 
> >>quote anywhere
> >>on the line it throws the entire line away by returning an empty array 
> >>of words.
> >>
> >>I have extracted the part of Text::ParseWords that I am using and put it 
> >>in a error
> >>demo program that is as small as is needed to show the error. 
> >>
> >>Question:
> >>
> >> Does anybody know how to modify the main regex to:
> >>  1. only tokenize a quoted string when that string starts with a single 
> >>or double quote
> >>  2. return all the tokens (including the quote in place) when any 
> >>unmatched quotes are present.
> >>
> >>To run, copy both enclosed files somewhere and run with this command:
> >>
> >>./parse-error-demo.pl  test.src
> >>
> >>I made one change to parse_line -- deleted reference to 
> >>$PERL_SINGLE_QUOTE --
> >>that should not effect this problem.
> >>
> >>Does anyone know of another perl module to parse input lines into tokens 
> >>treating
> >>quoted strings as single units by ignoring enclosed delimiters?
> >>
> >>Thanks for any help.
> >>
> >>Bob Lindley
> >>
> >>--------------020401030200020407060505
> >>Content-Type: text/plain;
> >> name="parse-error-demo.pl"
> >>Content-Transfer-Encoding: 7bit
> >>Content-Disposition: inline;
> >> filename="parse-error-demo.pl"
> >>
> >>#!/usr/bin/perl
> >>#
> >>use strict 'vars';
> >>use warnings;
> >># use Text::ParseWords;
> >>my($file, $input, $inline, @words1)
> >>;
> >>  $file = shift;
> >>  open IN, $file or die "Can't open $file:\n   $!\n";
> >>  # read all lines in current input file.
> >>  while($inline = <IN>) {
> >>    $inline =~ s/\s+$//; # trim trailing white space
> >>    $inline =~ s/^\s+//; # trim leading white space
> >>    print "|$inline|\n";
> >>    if($inline eq "") { next; }  # Skip blank lines
> >>    @words1 = &parse_line('\s+' , 'delimiters', $inline);
> >>    print join "|", @words1, "\n--------\n";
> >>    # Each item in @words holds:
> >>    #    empty string '' (e.g. word starts in col 1.)
> >>    #    word with only delimiters present
> >>    #    delimited word
> >>    #
> >>  }
> >>  close IN;
> >>  exit;
> >>
> >>sub parse_line {
> >>  # We will be testing undef strings
> >>  no warnings;
> >>  use re 'taint'; # if it's tainted, leave it as such
> >>
> >>  my($delimiter, $keep, $line) = @_;
> >>  my($quote, $quoted, $unquoted, $delim, $word, @pieces);
> >>  while (length($line)) {
> >>    ($quote, $quoted, undef, $unquoted, $delim, undef) =
> >>      $line =~ m/^(["'])                 # a $quote
> >>      ((?:\\.|(?!\1)[^\\])*)    # and $quoted text
> >>      \1                     # followed by the same quote
> >>      ([\000-\377]*)         # and the rest
> >>      |                       # --OR--
> >>      ^((?:\\.|[^\\"'])*?)    # an $unquoted text
> >>      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
> >>                                               # plus EOL, delimiter, or quote
> >>      ([\000-\377]*)           # the rest
> >>      /x;                      # extended layout
> >>    return() unless( $quote || length($unquoted) || length($delim));
> >>    $line = $+;
> >>    if ($keep) {
> >>      $quoted = "$quote$quoted$quote";
> >>    } else {
> >>      $unquoted =~ s/\\(.)/$1/g;
> >>      if (defined $quote) {
> >>        $quoted =~ s/\\(.)/$1/g if ($quote eq '"');
> >>        $quoted =~ s/\\([\\'])/$1/g if ($quote eq "'");
> >>      }
> >>    }
> >>    $word .= defined $quote ? $quoted : $unquoted;
> >>    if (length($delim)) {
> >>      push(@pieces, $word);
> >>      push(@pieces, $delim) if ($keep eq 'delimiters');
> >>      undef $word;
> >>    }
> >>    if (!length($line)) {
> >>      push(@pieces, $word);
> >>    }
> >>  }
> >>  return(@pieces);
> >>}
> >>
> >>
> >>
> >>__END__
> >>
> >>--------------020401030200020407060505
> >>Content-Type: text/plain;
> >> name="test.src"
> >>Content-Transfer-Encoding: 7bit
> >>Content-Disposition: inline;
> >> filename="test.src"
> >>
> >>An ordinary line parses just fine 'this has a space in it.'
> >>Mismatched quotes throw away the whole line "mismatched quotes.'
> >>Dave O'Neil worked with George O'Malley on this project.
> >>My name is David O'Neil 
> >>                        ^ENABLES THE "SSS" MSG'S TO THE DTC,    {57-000
> >>^ NOTE THE '' ABOVE MEANS TO USE ONE ' CHARACTER                {57-002
> >>
> >>--------------020401030200020407060505--
> >>
> >>    
> >>
> >
> >
> >  
> >
> 
> 



More information about the Phoenix-pm mailing list