SPUG: XPath on (less-than-perfect) HTML
Michael R. Wolf
MichaelRWolf at att.net
Tue Nov 17 21:44:39 PST 2009
On Nov 17, 2009, at 1:33 PM, Michael R. Wolf wrote:
> Yes, I know that XPath can only be applied to well-formed XML.
...that is, unless it's told to recover from (and be quiet about)
errors...
XML::LibXML::Parser documents recover() and (obsolete)
recover_silently() methods.
Here's code that I got to work.
Line 3 allows the parser to continue.
Line 4 suppresses its warnings.
1. use XML::LibXML;
2. my $parser = XML::LibXML->new();
3. $parser->recover(1);
4. $parser->recover(2);
5. my $doc = $parser->parse_html_string($scraped_content);
6. ($first_node, @nodes) = $doc->findnodes('/html/head/title');
7. ok($first_node, 'HTML Title: Found one node...');
8. ok(@nodes == 0, '... and no more nodes.');
9. my $title = $first_node->textContent();
--
Michael R. Wolf
All mammals learn by playing!
MichaelRWolf at att.net
More information about the spug-list
mailing list