[Pdx-pm] HTML::Parser help

Ovid publiustemp-pdxpm at yahoo.com
Fri Mar 4 13:46:13 PST 2005

Hi Thomas,

HTML::Parser is great, but not everyone can wrap their head around it
that easily.  Many prefer a procedural approach as this better maps to
how we're used to getting things done.  If that is appealing at all,
you may find HTML::TokeParser::Simple of use.  It's ridiculously easy
to use.  For example, to only print out the "visible" text in an HTML

  my $parser = HTML::TokeParser::Simple->new($html_file);
  while (my $token = $parser->get_token) {
    print $token->as_is if $token->is_text;

There are several more comprehensive examples in the distribution.  Of
course, while I admit to being biased, I do find it easier to use than

--- Thomas J Keller <kellert at ohsu.edu> wrote:
> Greetings all.
> The HTML::parser module provides methods for, literally, parsing
> HTML. 
> It can handle HTML text from a string or file and can separate out
> the 
> syntactic structures and data. You shouldn't use HTML::Parser
> directly, 
> however, since its interface hasn't been designed to make your life 
> easy when you parse HTML. It's merely a base class from which you can
> build your own parser to deal with HTML in any way you want.
> I've been away from Perl for a couple of months (grant due). But now 
> I'm back to tasks that are way more fun.
> I find I have to parse an html file to extract some data. I installed
> HTML::Parser today, but  I'm having trouble understanding how to
> write 
> the subs that get me what I want. Does anyone know of a good
> tutorial, 
> or some well commented examples?
> muchas gracias,
> Tom K.
> Thomas J. Keller, Ph.D.
> Director, MMI Core Facility
> Oregon Health & Science University
> 3181 SW Sam Jackson Park Rd.
> Portland, OR, USA,   97239
> http://www.ohsu.edu/research/core>
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list

If this message is a response to a question on a mailing list, please send
follow up questions to the list.

Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/

More information about the Pdx-pm-list mailing list