SPUG: XPath on (less-than-perfect) HTML

Colin Meyer cmeyer at helvella.org
Tue Dec 8 10:15:56 PST 2009


Just came across this blog post on xpath webscraping (via perlbuzz):

  http://ssscripting.blogspot.com/2009/12/using-perl-to-scrape-web.html

It aggrees with C.J.'s suggestion of using HTML::TreeBuilder::XPath

-Colin.

On Tue, Nov 17, 2009 at 07:28:41PM -0800, C.J. Adams-Collier wrote:
> HTML::TreeBuilder::XPath
> 
> On Tue, 2009-11-17 at 13:33 -0800, Michael R. Wolf wrote:
> 
> > Yes, I know that XPath can only be applied to well-formed XML.
> > 
> > That's the theoretical, pure, absolute truth.
> > 
> > I'm working in the real world where I can't find a well-formed page.   
> > (For instance, http://validator.w3c.org does not validate such biggies  
> > as amazon.com, ask.com, google.com, or msn.com).  For (my) practical  
> > purposes, there are no valid pages.


More information about the spug-list mailing list