SPUG: not_quite_XML::Parser

Michael R. Wolf MichaelRWolf at att.net
Thu Feb 8 18:38:38 PST 2007


I've got some almost_XML code.  That is, it is not well-formed.  Almost well
formed, but "almost" is "not".  It appears to be line-oriented enough that a
simple-minded line processing could clean it up, but I don't want to rely on
simple-minded if there's a TagSoup::Parser that I could use to clean it up.
Suggestions?

Michael

P.S.  Right now, these are the only non-conformant lines, but I want to have
an architecture that will scale to future problems:

1.  This needs to be self closing...

<meta name="keywords" content="XX Xxxxxxxxx Xxxxxxx Xxxxxxx,Xxxxxxxxxx
Xxxxxxxxx Xxxxxxx xxxxxx,Xxxxxxxxxx Xxxxxxxxx Xxxxxxx Xxxxxxx,XXX XXXXXX
Xxxxxxxxx Xxxxxxx Xxxxxxx,XX Xxxxxxxxx Xxxxxxx xxxxxx,Xxxxxxxxxx Xxxxxxxxx
Xxxxxxx Xxxxxxx xxxxxx xxxxx,Xxxxxxxxxx xxxxxxxxx xxxxxxx xxxxxxx
xxxxxxxxx,Xxxxxxxxxx xxxxxx xxxxx,Xxxxxxxxxx xxxxxxxx xxxxxxxxx,XXX XXXXXX
xxxxxxxx xxxxxxxxx">

2. And this has a problem at the "=".

    <PROP
name="trackurl">http://xxxxxxxxx.xxx/x/xxxxx.xxx?xxxxxx=XXX&xxxx=999999&xxx=
9999999&xxx=9xXXxXXXxxxXxx9XXxXX/XXxxxXxXXxxXXXxX/x99xxXXXxXxxXXXXXxxxxxxxxX
XXxxxxxXxxxXX-xxxx9xxXX9xXXXxXxxXxxx9Xx9xx9XXXx9xxxX9xxxxXXXxxxxxX9xX99x-XXX
x9x9xxxX</PROP>


-- 
Michael R. Wolf
    All mammals learn by playing!
        MichaelRWolf at att.net




More information about the spug-list mailing list