[Melbourne-pm] XML parsing

leif.eriksen at hpa.com.au leif.eriksen at hpa.com.au
Mon Mar 21 22:07:42 PST 2005


(Below assumes a *nix-styl environment)

Another way that is a LOT more complicated is to use XML::Xerces, which 
can validate using a schema.

However you have to build Xerces-C++, then the perl binding, then  
pretty much cut-n-paste the validation example into your script, because 
the doc is minimal. In its defence, it does work, and is fast. We use 
this solution here for validation, but it was difficult to get all the 
wrinkles out.

Also, to get more speed from XML::Simple, you can use XML::Parser as the 
backend SAX-generator rather than the default pure perl code of 
XML::SAX. I benchmarked a range of SAX-generators here 
<http://perlmonks.org/?node_id=409517> , so I would recommend 
XML::Parser. Just set $XML::Simple::PREFERRED_PARSER or the envvar 
XML_SIMPLE_PREFERRED_PARSER to the SAX-generator you prefer. XML::Parser 
will speed up all XML::Simple operation *significantly* - for example 
from 13 seconds to 3 seconds.

Note, I dont think XML::Parser will tell you the XML is invalid, but it 
will tell you the XML is not well formed - I cant seem to find any 
reference to XML::Parser being a validating parser.

Leif

alfiejohn at gmail.com wrote:

> Hi guys,
>
> Just a quicky...
>
> I have some XML (quite small) to parse but want it done elegantly, so 
> I thought that XML::Simple would be the way to go. However the XML 
> might be a bit dodgy, so I also wanted to validate it.
>
> The problem is that XML::Simple gives nice data structures, but if the 
> XML is invalid it will hang or take a very long time to parse (very 
> bad!). On the flip-side, XML::Parser will let me know if it's invalid 
> by dieing (which is what I want instead of hanging), but gives ugly 
> data structures compared to XML::Simple.
>
> So the way i'm thinking to combine the best of both worlds is to first 
> let XML::Parser parse it and if $@ doesn't get set, keep going with 
> XML::Simple.
>
> Does anyone have any suggestions on a better way of doing this?
>
> Thanks.
>
> int 20h;
> Alfie John
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>

-- 
Leif Eriksen
Snr Developer
http://www.hpa.com.au/
phone: +61 3 9217 5545
email: leif.eriksen at hpa.com.au


More information about the Melbourne-pm mailing list