[Phoenix-pm] Parsing XML with one regexp (heh, heh, heh)
Scott Walters
scott at illogics.org
Fri Oct 6 16:45:14 PDT 2006
Excerpt from the perl5-porters digest:
On to the main course, Yves then delivered a patch that introduced a
new assertion that allows the writing of elegant expressions to match
arbitrarily deeply nested pairs of tokens. For instance, one could
parse XML documents code with:
/^(<(?:[^<>]+|(?1))*>)$/
(along with the appropriate amount of hand waving). Yves was critical
of the implementation, in that it requires capturing "(...)" to be
used, rather than grouping "(?:...)", but on the other hand it
conforms with what Python and PCRE already do.
Dave Mitchell was most impressed, and quizzed Yves about the
behavioural semantics regarding backtracking. Rafael wanted to know if
it was possible to extend the patch to allow case sensitivity changes
by way of something like "(?i(?1))". (Alas, no).
Robin Houston also expressed his delight at the patch and admitted to
being the party who added this functionality to PCRE since he lacked
the courage to attack Perl's regular expression internals. He too
raised a couple of questions about behaviour in borderline areas.
Yves followed up with another version of the patch that added user
documentation and more tests. He has also been reading Jeffrey
Friedl's *Mastering Regular Expressions*, and has taken up the
challenge to resolve as much as possible the areas in which Jeffrey
finds Perl's regular expressions wanting. He laid out his roadmap in
"perltodo".
All applied.
And the crowd goes wild
http://xrl.us/r4u7
More information about the Phoenix-pm
mailing list