[tpm] Solutions and kibitzers
Liam R E Quin
liam at holoweb.net
Mon Nov 16 14:43:58 PST 2015
On Tue, 22 Oct 2013 12:45:20 -0400
arocker at Vex.Net wrote:
> It seemed to be a simple problem, parsing some sort of *ML stream, and
> wc's output on the script was 25 88 526. (6 of those 25 lines do the
> actual work.)
>
> To my surprise, I've received all sorts of abuse for not using an XML
> parser module. (To which the poster may or may not have had easy access.)
If they had Perl they had an XML parser.
The problem with handling XML as text is that people often don't account for what seem like corner cases.
Some examples:
1. these are all the same in XML:
<boy socks='black'></boy>
<boy
socks="black"
/>
<boy socks = "black"></boy
>
<boy socks="black" />
The following variant may or may not be the same, but is still legal:
<boy socks='black'><!-- . . . . --></boy>
Did you account for all of them?
2. text entities,
<!DOCTYPE boy [
<!ENTITY socks "black">
]>
<boy socks="&socks;">
3. UTF-8 is common on Unix systems but other encodings are legal, and are signaled with
an XML encoding declaration; did you handle them?
A five-minute hack that isn't for production is one thing; a program or production is another.
Many (not all) things you might use Perl for with XML are better done with XSLT and/or XQuery.
Liam
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
More information about the toronto-pm
mailing list