[Melbourne-pm] Using perl libraries to apply XSL to XML
Ben Marsh
blmarsh at gmail.com
Tue Mar 24 05:27:59 PDT 2009
Hi,
I am writing a www bot that gets xml from a website that includes a xsl
stylesheet to apply to the xml to give html. Browsers seem to do this for us
when browsing the site. WWW::Mechanize does not seem to. I wrote some code
but hit a conundrum. How do I get the url of the xsl from the xml content,
fetch it via http and apply it to the xml using XML::LibXSLT?
Hi, Thanks for the reply. I realized that I need libxslt. But unless I am
missing something I don't see how to pull the xsl uri out of the xml and
feed it to libxslt (XML::LibXSLT). That is my problem.
the xml starts with:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/something.xsl"?>
<page>
...
</page>
Maybe I should just grep through the xml to find the stylesheet? Maybe I
feed XML::LibXSLT a URL? Maybe I just feed the xml to XML::LibXSLT and it
fetches the XSL stylesheet automagically? I don't know. I have not been
able to figure out more than what I have below from the docos and examples.
Can you help me? Thanks, Ben Marsh
Here is my code:
<code>
use lib qw|/home/blm/perl/lib|;
use strict;
use WWW::Mechanize;
use XML::LibXML;
use XML::LibXSLT;
my $mech = WWW::Mechanize->new(agent => 'Mozilla/5.0 (X11; U; Linux i686;
en-US;+ rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1' );
my $url = 'https://some.url.here/';
$mech->delete_header('accept-encoding');
$mech->get($url);
$mech->update_html($mech->content());
print $mech->content;
my $parser = XML::LibXML->new();
my $style_parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $doc = $parser->parse_string($mech->content());
print $doc->toString();
my $stylesheet_location = ***Here
is my problem***
$mech->get($stylesheet_location);
my $stylesheet_string = $mech->content();
my $styledoc = $style_parser->parse_string($stylesheet_string);
my $stylesheet = $xslt->parse_stylesheet($styledoc);
my $results = $xslt->transform($doc);
print $results;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20090324/49da74e6/attachment.html>
More information about the Melbourne-pm
mailing list