[Chicago-talk] parsing HTML
Andy Lester
andy at petdance.com
Fri Feb 23 13:35:36 PST 2007
On Feb 23, 2007, at 3:18 PM, Jay Strauss wrote:
> Would you suggest using a regex (that I can't get to work) or some
> module (like HTML::Parser)?
If all you want is the text, look at WWW::Mechanize's ->content()
method.
http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm#%
24mech-%3Econtent(...)
$mech->content( format => "text" )
Returns a text-only version of the page, with all HTML markup
stripped. This feature requires HTML::TreeBuilder to be installed, or
a fatal error will be thrown.
--
Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance
More information about the Chicago-talk
mailing list