SPUG: XPath on (less-than-perfect) HTML
Michael R. Wolf
MichaelRWolf at att.net
Thu Dec 31 13:41:51 PST 2009
On Dec 31, 2009, at 1:15 PM, Joshua ben Jore wrote:
> On Tue, Nov 17, 2009 at 1:33 PM, Michael R. Wolf
> <MichaelRWolf at att.net> wrote:
>> Yes, I know that XPath can only be applied to well-formed XML.
>>
>> That's the theoretical, pure, absolute truth.
>
> I've happily used XML::LibXML per Randal Schwartz in Linux Magazine
> (Jun 2003) at http://www.stonehenge.com/merlyn/LinuxMag/col49.html
Thanks. Randal's article(s) were one of my motivations for using
XPATH. I got my code working after fixing two version problems on my
Mac, both of which I think were nice, though in hind sight, I don't
think change #1 was strictly necessary. Without a deep analysis of
the changes, my I went with my gut (and the expertise of the authors)
and updated the CPAN module.
1. Updated XML::LibXML to version 1.70 from CPAN
2. updated libxml2 (version 2.7.6) from macports
I've appended a fragment of the code I got working. It's not yet
perfect (for some[1] definition of perfect), but it works. That is, I
did the elegant "growth" phase but haven't completed the elegant
"prune" phase.
Enjoy,
Michael
Notes:
1. For *this* definition of perfection...
Perfection is achieved not when you have nothing more to add,
but when you have nothing left to take away.
-- Antoine de Saint-Exupery
-- as quoted on http://perlgolf.sourceforge.net
================================================================
my %parse_options = (
#suppress_warnings => 1,
suppress_errors => 1,
recover => 1,
# validation => 0,
);
# Former versions...
my $dom;
if (XML::LibXML->can('load_html')) {
# Works on mac at v1.70, but not on PC at v1.65
# my $dom = $parser->load_html(string=>$content, \%parse_options);
$dom = XML::LibXML->load_html(string=>$content, \%parse_options);
}
else {
# Works on PC at v1.65
my $parser = XML::LibXML->new(\%parse_options);
my $doc = $parser->parse_html_string($content, \%parse_options);
$dom = $doc;
}
#... snip, snip...
my @nodes = $dom->findnode($xpath);
--
Michael R. Wolf
All mammals learn by playing!
MichaelRWolf at att.net
More information about the spug-list
mailing list