[sf-perl] XML Parsing question
Bill Moseley
moseley at hank.org
Thu Mar 3 15:41:31 PST 2011
I have a collection of XML files that have one or more <result> elements in
each file. The goal is have a script where I can pass one or more files on
the command line which will gather up all the <result> elements from each
file and combine into a single XML output file.
I pulled XML::TreeBuilder out (I find TreeBuilder pretty easy for quick
scripts) and did the following. I seem to not work with XML that much
(which I think is a bit lucky), so there may be easier ways to do this.
#!/usr/bin/perl
use strict;
use warnings;
use XML::TreeBuilder;
use XML::Element;
use Encode;
my $doc = XML::Element->new( 'testResults' );
for my $path ( @ARGV ) {
my $tree = XML::TreeBuilder->new;
$tree->parse_file( $path );
$doc->push_content( $tree->look_down( '_tag', 'result' ) );
$tree->delete;
}
print join "\n",
'<?xml version="1.0" encoding="UTF-8"?>',
encode_utf8( $doc->as_XML );
That seems to work ok. But, then I ended up with a file that had a CDATA
section (which happened to hold a snippet of HTML). That's fine, but
$doc->as_XML then encoded the entities.
With this source file:
$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<testResults>
<result>
<content><![CDATA[<strong>this is strong</strong>]]></content>
</result>
</testResults>
I run through the script I get:
$ cat new.xml
<?xml version="1.0" encoding="UTF-8"?>
<testResults><result>
<content><strong>this
is strong</strong></content>
</result></testResults>
And if I then run *that* file back through the script I get:
undefined entity at line 3, column 40, byte 101 at
/usr/lib/perl5/XML/Parser.pm line 187
which is choking at the
My questions are:
1) Is there a better approach to doing this that preserves the CDATA
sections?
2) Is there a way to define the entity? I tried adding DTD to
defined the , but wasn't able to make the parser happy in my attempts.
--
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/sanfrancisco-pm/attachments/20110303/34b9e8da/attachment.html>
More information about the SanFrancisco-pm
mailing list