[Chicago-talk] HTML Parsing

Shlomi Fish shlomif at iglu.org.il
Sat Dec 25 02:30:49 PST 2010


Hi Darren,

On Friday 24 December 2010 19:39:34 Young, Darren wrote:
> I have HTML that contains a table that I need to extract fields from. In
> the end I want to take this data and shove it in a MySQL table but CSV in
> the interim would suffice. HTML::Parser and HTML::TreeBuilder appear like
> they can do this but does anyone know any "simpler" modules for this? It's
> been a long while since I tried this type of thing.
> 

Well, http://search.cpan.org/dist/HTML-TableExtract/ has a good reputation and 
good reviews on CPAN (and quite a few open bugs which indicate people actually 
tried to use it.).

If that fails, you should try 
http://search.cpan.org/dist/HTML-TreeBuilder-LibXML/ , while not "simpler" 
than plain HTML::TreeBuilder, it is more powerful and also gives you XPath and 
other nice features.

> Oh, content is coming from LWP's $res->content.
> 

I think both modules should be able to handle these fine.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
UNIX Fortune Cookies - http://www.shlomifish.org/humour/fortunes/

Chuck Norris can make the statement "This statement is false" a true one.

Please reply to list if it's a mailing list post - http://shlom.in/reply .


More information about the Chicago-talk mailing list