SPUG: XPath on (less-than-perfect) HTML
Colin Meyer
cmeyer at helvella.org
Tue Dec 8 10:15:56 PST 2009
Just came across this blog post on xpath webscraping (via perlbuzz):
http://ssscripting.blogspot.com/2009/12/using-perl-to-scrape-web.html
It aggrees with C.J.'s suggestion of using HTML::TreeBuilder::XPath
-Colin.
On Tue, Nov 17, 2009 at 07:28:41PM -0800, C.J. Adams-Collier wrote:
> HTML::TreeBuilder::XPath
>
> On Tue, 2009-11-17 at 13:33 -0800, Michael R. Wolf wrote:
>
> > Yes, I know that XPath can only be applied to well-formed XML.
> >
> > That's the theoretical, pure, absolute truth.
> >
> > I'm working in the real world where I can't find a well-formed page.
> > (For instance, http://validator.w3c.org does not validate such biggies
> > as amazon.com, ask.com, google.com, or msn.com). For (my) practical
> > purposes, there are no valid pages.
More information about the spug-list
mailing list