[Buffalo-pm] XML File Parsing And Manipulation...

Andrew Bruno aebruno2 at cse.Buffalo.EDU
Wed Dec 28 22:42:17 PST 2005


Hi Dan,

I would check out XSLT. I know this is the un-perl way to do it but XSLT
is a language built for transforming XML documents and XPath/XSLT provide
some nice ways of manipulating XML. Attached is a quick XSLT that seems to
do the same thing as your perl program. You will need an XSLT processor to
run it and can use the perl module XML::LibXSLT. There is also a great
command line utility called xsltproc (part of libxml) which is included on
most Linux distros. If you have it just run:

  $ xsltproc remove-ds1.xslt example.xml

This will print out the new XML file with the elements removed to stdout.
It may be overkill for your problem but it might be worth looking into if
your going to be doing quite a bit of XML processing.

I would also check out XML::LibXML and XML::LibXSLT which are the perl
bindings into the GNOME libxml libraries. It's always a good idea to use
some kind of XML processing library rather than regex's when it comes to
parsing XML. The libraries will usually help out dealing with
well-formedness, escaping entities, encodings, and all the subtleties of
XML. Here is some example perl code that will transform your XML file
using the attached XSLT:

use strict;
use XML::LibXML;
use XML::LibXSLT;

# Parse the XML file
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file("example.xml");

# Transform with XSLT
my $xslt = XML::LibXSLT->new();
my $style_doc = $parser->parse_file("remove-ds1.xslt");
my $stylesheet = $xslt->parse_stylesheet($style_doc);
my $results = $stylesheet->transform($doc);
$stylesheet->output_file($results, "example.xml.new");

Hope this helps.

Cheers,

--Andy


On Tue, 27 Dec 2005, DANIEL MAGNUSZEWSKI wrote:

> Mongers,
>
> I apologize in advance for the long email...
>
> I have an optimization/"is this the best way to do it" question.
> Hopefully someone can offer a better solution.
>
> Problem:
>
> I need to take the following XML file, and essentially remove a few tags
> and their contents (including tags and data contained within those
> tags). Here is a link to the file:
> http://www.acsu.buffalo.edu/~dkm/example.xml
>
> The XML file is a dump of a Round Robin Database (RRD -
> www.rrdtool.org). This RRD has two data sources (named "la" and "ds1").
> I need to remove the "ds1" datasource, and the only way to remove it is
> by dumping the RRD to XML, modifying the XML file, then restoring the
> RRD from the XML file.
>
> What I need to do is remove the datasource information and the actual
> numerical data for "ds1". The numerical data for the two datasources is
> contained in the <v> tags, which is within the <row> tags. I will need
> to remove the second set of <v> tags and its data.
>
> The following sections need to be removed from the file (in addition to
> the second <v> tags and data):
>
> <ds>
>      <name> ds1 </name>
>      <type> GAUGE </type>
>      <minimal_heartbeat> 600 </minimal_heartbeat>
>      <min> 0.0000000000e+00 </min>
>      <max> 2.0000000000e+05 </max>
>      <last_ds> UNKN </last_ds>
>      <value> 9.0000000000e+00 </value>
>      <unknown_sec> 0 </unknown_sec>
> </ds>
> ....
> <ds><value> NaN </value>  <unknown_datapoints> 0
> </unknown_datapoints></ds>
> ...
> <ds><value> 7.0166666667e+00 </value>  <unknown_datapoints> 0
> </unknown_datapoints></ds>
-------------- next part --------------
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

	<xsl:template match="ds">
		<!-- Only select data sources that do not contain 'ds1' -->
		<xsl:if test="not(contains(name, 'ds1'))">
			<ds>
				<xsl:apply-templates select="./*"/>
			</ds>
		</xsl:if>
	</xsl:template>

	<xsl:template match="cdp_prep">
		<cdp_prep>
			<!-- Only select the first ds element -->
			<xsl:apply-templates select="ds[1]"/>
		</cdp_prep>
	</xsl:template>

	<xsl:template match="row">
		<row>
			<!-- Only select the first v element -->
			<xsl:apply-templates select="v[1]"/>
		</row>
	</xsl:template>

	<!-- Just copy all other elements -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>


More information about the Buffalo-pm mailing list