[Pdx-pm] NEWBIE question - Am i making this too complex parsing
HTML
Ovid
publiustemp-pdxpm at yahoo.com
Sat May 7 16:00:09 PDT 2005
Hi Pete,
That seems like some fairly nice code for a newbie. (Are you sure
you're a newbie?)
> use LWP;
> use HTML::TokeParser::Simple;
>
> # using LWP instead of Simple for future needs
> my $browser = LWP::UserAgent->new;
> my $url = "http://www.undeerc.org/wind/winddb";
>
> my $response = $browser->get( $url );
> die "Canât get $url -- ", $response->status_line
> unless $response->is_success;
>
> my $content = $response->content;
>
> $content =~ s/\r//g;
>
> my $p=HTML::TokeParser::Simple->new(\$content);
I noticed you mention that you need LWP. If you can explain what
features you need beyond LWP::Simple, I can see what I can do about
expaning HTML::TokeParser::Simple to incorporate those needs, perhaps
by allowing you to pass a callback that will fetch the HTML for you.
In the meantime, if LWP::Simple were sufficient (though it sounds like
it might not be), the following will accomplish what you have:
use HTML::TokeParser::Simple 3.13;
my $p = HTML::TokeParser::Simple->new(url => $url) or die $!;
> my ($href, $token);
>
> while ( $token = $p->get_token ) {
> if ( $token->is_start_tag('map') && ( $token->get_attr('name') eq
> 'region' ) ) {
> until ($token->is_end_tag('map') ) {
> $token = $p->get_token;
> if ($token->is_start_tag('area') ) {
> $href = $token->get_attr('href');
> print "HREF:$href\n";
> }
> }
> last;
> }
> }
Remember that HTML::TokeParser::Simple is a subclass of
HTML::TokeParser, so the methods in the latter still work. In
particular, you can call "get_tag" with a tag name to jump straight to
it (though you need to be careful not to overshoot other tags that are
important.) Here's how I might write that, though I'm not sure it's
much bettter.
while (my $token = $p->get_tag('map')) {
until ($token->is_end_tag('map')) {
$token = $p->get_tag or last; # out of HTML
next unless $token->is_start_tag('area');
my $href = $token->get_attr('href') or next;
print "HREF: $href\n";
}
}
Cheers,
Ovid
--
If this message is a response to a question on a mailing list, please send
follow up questions to the list.
Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/
More information about the Pdx-pm-list
mailing list