SPUG: extracting text between <a> and </a>
Ice Demon
admin at qdq.net
Thu Oct 5 12:12:28 CDT 2000
You could try this
$link = "<a href=\"http://text.the.recipe.gets\"> this is the text I actually
want</a>";
$link =~ s/<a\s+href=.*?>//i;
$link =~ s/<\/a>//i;
print $link . "\n";
Todd Wells wrote:
> I looked in the Cookbook, it has a recipe to extract the actual links (which
> you'll see I'm doing in my code below), but I can't tell how to get the text
> between the tags -- unless I'm looking at it incorrectly.
>
> <a href="http://text.the.recipe.gets"> this is the text I actually want</a>
>
> -----Original Message-----
> From: Rush Family [mailto:rush at citylinq.com]
> Sent: Thursday, October 05, 2000 9:30 AM
> To: Todd Wells; 'SPUG'
> Subject: RE: SPUG: extracting text between <a> and </a>
>
> Although I do not have it in front of me to check, I believe this exact
> problem is solved in the Perl Cookbook from O'Reilly.
>
> -----Original Message-----
> From: owner-spug-list at pm.org [mailto:owner-spug-list at pm.org]On Behalf Of
> Todd Wells
> Sent: Thursday, October 05, 2000 8:55 AM
> To: 'SPUG'
> Subject: SPUG: extracting text between <a> and </a>
>
> I'm working on a little web automation routine and I've used HTML::LinkExtor
> to extract the links from a web page, then I'm processing each of those
> links.
>
> What I'd like to know is if there's some easy way that I could get the
> original text that accompanied that link. e.g., <a href =
> "http://thislink"> this text here I want </a>.
>
> sub link_scan
> {
> # input is $url, output is a list of links found at that URL
>
> my $url = shift;
> my @linklist; my @ziplist;
>
> # retrieve HTML doc at URL
> my $ua = new LWP::UserAgent;
> my $request = new HTTP::Request('GET', $url);
> my $response = $ua->request($request);
> my $body = $response->content;
> my $base = $response->base;
>
> # scan HTML doc for other URLS
> my $link_parser = HTML::LinkExtor->new();
> $link_parser->parse($body);
> my @parsed = $link_parser->links;
>
> foreach my $link (@parsed)
> {
> my $tag = $link->[0];
>
> if (($tag eq "a") or ($tag eq "A"))
> {
> my $text = $link_parser->get_trimmed_text
> my $new_url = new URI::URL $link->[2];
> my $full_url = $new_url->abs($url);
> chomp $full_url;
> unless (already_processed($full_url)) {push @linklist,
> $full_url;}
> }
> }
> return @linklist;
> }
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
> Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
> Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
> For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
> Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list
mailing list