[Pdx-pm] HTML::Parser help

Michael G Schwern schwern at pobox.com
Fri Mar 4 13:47:15 PST 2005


On Fri, Mar 04, 2005 at 01:14:02PM -0800, Thomas J Keller wrote:
> I've been away from Perl for a couple of months (grant due). But now 
> I'm back to tasks that are way more fun. 
> I find I have to parse an html file to extract some data. I installed 
> HTML::Parser today, but  I'm having trouble understanding how to write 
> the subs that get me what I want. Does anyone know of a good tutorial, 
> or some well commented examples? 

First off I'd recommend Sean Burke's HTML::Tree.  He's also written a
book on the subject.  He also wrote quite a good book on the topic of
Perl and the Web with several chapters on HTML parsing.  "Perl & LWP"
http://www.oreilly.com/catalog/perllwp/

Otherwise it depends on what you want.  There's plenty of existing packages to
convert HTML into text, HTML-Format being one.  HTML::LinkExtor and 
HTML::LinkExtractor can both pull links out of HTML.


More information about the Pdx-pm-list mailing list