[Edinburgh-pm] seeking collaboration for a scalable Perl web-scraping library

Murray perl at minty.org
Sun Mar 13 09:17:08 PDT 2011


On Sun, Mar 13, 2011 at 01:57:19PM +0000, Antonio Bonifati wrote:
> I just discovered your group and will take part in the meeting. 24th this
> month, right?

Right.  http://cumberlandbar.co.uk/

> Anyone interested in web harvesting and data extraction in Perl, all done the
> right way with proper JavaScript support? I cannot pay anything, it's all
> open-source but considering the high demand for scraping data from this
> unstructured Wild Wild Web it won't be difficult to find companies that will
> want to use this technology.

I didn't know you could use Inline::Java in that way to then wrap HTMLUnit, and
thus get yourself a Perl driven JavaScript browser.  Quite nifty.

Although I'd still prefer something that could embed or wrap webkit or gecko
with enough glue to make them accessible from Perl and enough hooks to be able
to (ab)use the parsers & rendering/js engines.  Especially the HTML5 parser.

Murray.


More information about the Edinburgh-pm mailing list