[ABE.pm] Screen-scraping

Faber Fedor faber at linuxnj.com
Mon Nov 14 17:07:19 PST 2005


Guys,

I've got to pull data off of two websites. I plan on using WWW:MEchanize
and HTML::TokeParser.  The one website seems to be easy enough, but the
HTML from the second webiste is alot of tags, embedded tables, etc. with
no identifying tags on the data.  My Qs are:

1) Is there a better tool for this than HTML::TokeParser
2) Does anyone know of a tool that will parse HTML and build a
DOM-like object? It would be easier to walk through that then the actual
text/HTML.


-- 
 
Regards,
 
Faber Fedor
President
Linux New Jersey, Inc.
908-320-0357
800-706-0701

http://www.linuxnj.com





More information about the ABE-pm mailing list