[Pdx-pm] NEWBIE question - Am i making this too complex parsing HTML

Pete Lancashire nix at petelancashire.com
Sat May 7 14:06:12 PDT 2005


I have a URL that returns two MAPs. I need to extract
all the HREFs from one of the maps. The one with the
name 'region'. What I came up with is:

#!/usr/local/bin/perl

use warnings;
use strict; $|++;
my $VERSION = "0.01";

use LWP;
use HTML::TokeParser::Simple;

# using LWP instead of Simple for future needs
my $browser = LWP::UserAgent->new;
my $url = "http://www.undeerc.org/wind/winddb";

my $response = $browser->get( $url );
  die "Can’t get $url -- ", $response->status_line
   unless $response->is_success;

my $content = $response->content;

$content =~ s/\r//g;

my $p=HTML::TokeParser::Simple->new(\$content);

my ($href, $token);

while ( $token = $p->get_token ) {
  if ( $token->is_start_tag('map') && ( $token->get_attr('name') eq
'region' ) ) {
    until ($token->is_end_tag('map') ) {
      $token = $p->get_token;
      if ($token->is_start_tag('area') ) {
        $href = $token->get_attr('href');
        print "HREF:$href\n";
      }
    }
    last;
  }
}

TIA 

-pete





More information about the Pdx-pm-list mailing list