[Chicago-talk] How would you do this?

Brian Tatnall btatnall at gmail.com
Mon Nov 12 10:12:47 PST 2007


Take a look at Selenium.  I haven't used it for scraping, but Björn
Mårtensson has used it for scraping dynamically built javascript pages.


http://www.openqa.org/selenium-ide/
http://www.jroller.com/bjornmartensson/entry/web_scraping_in_early_2007

All the best,
Brian Tatnall

On 11/12/07, David Young <davidy at nationalcycle.com> wrote:
>
> Hi,
>
> I'm on another mailing list of colleagues, and this question came
> up.  What
> would you all recommend?
>
> Ydy
>
>
> -----Original Message-----
> Sent: Monday, November 12, 2007 10:32 AM
> To: techies at xxxxxxxxxxxxxxxxxxxx
> Subject: How would you do this?
>
> More and more, I'm finding the need to do some page-scraping of web pages
> that are Web 2.0-ey.
>
> My normal scraping tool is perl's WWW::Mechanize and such.  But it is
> completely javascript brain-dead.
>
> For example...I'd like to write something that goes to YouTube, looks up
> a set of videos, analyzes the comments and ratings, and presents a
> summary report.  That's just an example of something that is conceptually
> simple (easily done manually) but can't be done with WWW::Mechanize.
>
> So if you wanted to do something like that, what would you use?
>
> I'm not adverse to using something other than perl...something
> Windows-based
> would be my last resort.  I'd consider a commercial product, though
> preferably not a multi-hundred-dollar one...
>
> Obviously, something that submits as well as scrapes would be ideal.
>
>
>
>
>
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/chicago-talk/attachments/20071112/80bb07ae/attachment.html 


More information about the Chicago-talk mailing list