[Melbourne-pm] Using perl libraries to apply XSL to XML

Ben Marsh blmarsh at gmail.com
Tue Mar 24 05:27:59 PDT 2009


Hi,

I am writing a www bot that gets xml from a website that includes a xsl
stylesheet to apply to the xml to give html. Browsers seem to do this for us
when browsing the site. WWW::Mechanize does not seem to.  I wrote some code
but hit a conundrum.  How do I get the url of the xsl from the xml content,
fetch it via http and apply it to the xml using XML::LibXSLT?

Hi, Thanks for the reply.  I realized that I need libxslt. But unless I am
missing something I don't see how to pull the xsl uri out of the xml and
feed it to libxslt (XML::LibXSLT).  That is my problem.

the xml starts with:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/something.xsl"?>

<page>

...

</page>

Maybe I should just grep through the xml to find the stylesheet?  Maybe I
feed XML::LibXSLT a URL?  Maybe I just feed the xml to XML::LibXSLT and it
fetches the XSL stylesheet automagically?  I don't know. I have not been
able to figure out more than what I have below from the docos and examples.

Can you help me?  Thanks,  Ben Marsh

Here is my code:
<code>
use lib qw|/home/blm/perl/lib|;

use strict;

use WWW::Mechanize;
use XML::LibXML;
use XML::LibXSLT;

my $mech = WWW::Mechanize->new(agent => 'Mozilla/5.0 (X11; U; Linux i686;
en-US;+ rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1' );

my $url = 'https://some.url.here/';
$mech->delete_header('accept-encoding');
$mech->get($url);

$mech->update_html($mech->content());

print $mech->content;
my $parser = XML::LibXML->new();
my $style_parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();

my $doc = $parser->parse_string($mech->content());
print $doc->toString();
my $stylesheet_location =                                            ***Here
is my problem***
$mech->get($stylesheet_location);
my $stylesheet_string = $mech->content();
my $styledoc = $style_parser->parse_string($stylesheet_string);
my $stylesheet = $xslt->parse_stylesheet($styledoc);
my $results = $xslt->transform($doc);
print $results;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20090324/49da74e6/attachment.html>


More information about the Melbourne-pm mailing list