SPUG: not_quite_XML::Parser

Joshua ben Jore twists at gmail.com
Sat Feb 10 20:51:31 PST 2007


On 2/8/07, Michael R. Wolf <MichaelRWolf at att.net> wrote:
> I've got some almost_XML code.  That is, it is not well-formed.  Almost well
> formed, but "almost" is "not".  It appears to be line-oriented enough that a
> simple-minded line processing could clean it up, but I don't want to rely on
> simple-minded if there's a TagSoup::Parser that I could use to clean it up.
> Suggestions?

XML::LibXML has an "HTML" feature which lets it handle badly formed
input. I've even used it to scrape web sites. Works neat.

Josh


More information about the spug-list mailing list