[Chicago-talk] parsing HTML

Andy Lester andy at petdance.com
Fri Feb 23 13:35:36 PST 2007


On Feb 23, 2007, at 3:18 PM, Jay Strauss wrote:

> Would you suggest using a regex (that I can't get to work) or some
> module (like HTML::Parser)?

If all you want is the text, look at WWW::Mechanize's  ->content()  
method.

http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm#% 
24mech-%3Econtent(...)

$mech->content( format => "text" )

     Returns a text-only version of the page, with all HTML markup  
stripped. This feature requires HTML::TreeBuilder to be installed, or  
a fatal error will be thrown.


--
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance






More information about the Chicago-talk mailing list