SPUG: extracting text between <a> and </a>

Thu Oct 5 12:12:28 CDT 2000

You could try this

$link = "<a href=\"http://text.the.recipe.gets\"> this is the text I actually
want</a>";
$link =~ s/<a\s+href=.*?>//i;
$link =~ s/<\/a>//i;
print $link . "\n";

Todd Wells wrote:

> I looked in the Cookbook, it has a recipe to extract the actual links (which
> you'll see I'm doing in my code below), but I can't tell how to get the text
> between the tags -- unless I'm looking at it incorrectly.
>
> <a href="http://text.the.recipe.gets"> this is the text I actually want</a>
>
> -----Original Message-----
> From: Rush Family [mailto:rush at citylinq.com]
> Sent: Thursday, October 05, 2000 9:30 AM
> To: Todd Wells; 'SPUG'
> Subject: RE: SPUG: extracting text between <a> and </a>
>
> Although I do not have it in front of me to check, I believe this exact
> problem is solved in the Perl Cookbook from O'Reilly.
>
> -----Original Message-----
> From: owner-spug-list at pm.org [mailto:owner-spug-list at pm.org]On Behalf Of
> Todd Wells
> Sent: Thursday, October 05, 2000 8:55 AM
> To: 'SPUG'
> Subject: SPUG: extracting text between <a> and </a>
>
> I'm working on a little web automation routine and I've used HTML::LinkExtor
> to extract the links from a web page, then I'm processing each of those
> links.
>
> What I'd like to know is if there's some easy way that I could get the
> original text that accompanied that link.  e.g., <a href =
> "http://thislink"> this text here I want </a>.
>
> sub link_scan
> {
>     # input is $url, output is a list of links found at that URL
>
>     my $url = shift;
>     my @linklist; my @ziplist;
>
>     # retrieve HTML doc at URL
>     my $ua = new LWP::UserAgent;
>     my $request = new HTTP::Request('GET', $url);
>     my $response = $ua->request($request);
>     my $body = $response->content;
>     my $base = $response->base;
>
>     # scan HTML doc for other URLS
>     my $link_parser = HTML::LinkExtor->new();
>     $link_parser->parse($body);
>     my @parsed = $link_parser->links;
>
>     foreach my $link (@parsed)
>     {
>         my $tag = $link->[0];
>
>         if (($tag eq "a") or ($tag eq "A"))
>         {
>             my $text = $link_parser->get_trimmed_text
>             my $new_url = new URI::URL $link->[2];
>             my $full_url = $new_url->abs($url);
>             chomp $full_url;
>             unless (already_processed($full_url)) {push @linklist,
> $full_url;}
>         }
>     }
>     return @linklist;
> }
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
>  For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/