[Phoenix-pm] Parsing XML with one regexp (heh, heh, heh)

Scott Walters scott at illogics.org
Fri Oct 6 16:45:14 PDT 2006


Excerpt from the perl5-porters digest:

  On to the main course, Yves then delivered a patch that introduced a
  new assertion that allows the writing of elegant expressions to match
  arbitrarily deeply nested pairs of tokens. For instance, one could
  parse XML documents code with:

    /^(<(?:[^<>]+|(?1))*>)$/

  (along with the appropriate amount of hand waving). Yves was critical
  of the implementation, in that it requires capturing "(...)" to be
  used, rather than grouping "(?:...)", but on the other hand it
  conforms with what Python and PCRE already do.

  Dave Mitchell was most impressed, and quizzed Yves about the
  behavioural semantics regarding backtracking. Rafael wanted to know if
  it was possible to extend the patch to allow case sensitivity changes
  by way of something like "(?i(?1))". (Alas, no).

  Robin Houston also expressed his delight at the patch and admitted to
  being the party who added this functionality to PCRE since he lacked
  the courage to attack Perl's regular expression internals. He too
  raised a couple of questions about behaviour in borderline areas.

  Yves followed up with another version of the patch that added user
  documentation and more tests. He has also been reading Jeffrey
  Friedl's *Mastering Regular Expressions*, and has taken up the
  challenge to resolve as much as possible the areas in which Jeffrey
  finds Perl's regular expressions wanting. He laid out his roadmap in
  "perltodo".

  All applied.

    And the crowd goes wild
    http://xrl.us/r4u7




More information about the Phoenix-pm mailing list