[Pdx-pm] HTML::Parser help
Michael G Schwern
schwern at pobox.com
Fri Mar 4 13:47:15 PST 2005
On Fri, Mar 04, 2005 at 01:14:02PM -0800, Thomas J Keller wrote:
> I've been away from Perl for a couple of months (grant due). But now
> I'm back to tasks that are way more fun.
> I find I have to parse an html file to extract some data. I installed
> HTML::Parser today, but I'm having trouble understanding how to write
> the subs that get me what I want. Does anyone know of a good tutorial,
> or some well commented examples?
First off I'd recommend Sean Burke's HTML::Tree. He's also written a
book on the subject. He also wrote quite a good book on the topic of
Perl and the Web with several chapters on HTML parsing. "Perl & LWP"
Otherwise it depends on what you want. There's plenty of existing packages to
convert HTML into text, HTML-Format being one. HTML::LinkExtor and
HTML::LinkExtractor can both pull links out of HTML.
More information about the Pdx-pm-list