SPUG: Re: Trapping LWP exceptions
Christopher Cavnor
christopher at cavnor.com
Mon Sep 11 12:15:52 CDT 2000
Maybe something like:
while ( sleep(900) ){
my $resp = eval{$ua->request($req)};
next if $@;
process_page( $resp->as_string );
}
Alternatively, if this is a perpetual function, you might want to use a
Robot User Agent, which can handle it's own timing (default is one minute
per visit) and obeys the robot exclusion standard (as well as exposes your
bot's info for niceness' sake).
Warning: untested code approaching :-)
use LWP::RobotUA;
sub
patch_bot{
# pass in args: the URL to fetch, this bot's name, a contact email,
# the desired time to wait between requests, a proxy adress (if one exists)
my ($source, $botname, $email, $delay, $proxy) = @_;
# Create a Robot User Agent object, give it a name
my $rua = LWP::RobotUA->new($botname, $email);
# If proxy server specified, define it in the User Agent object
if (defined $proxy) {
$rua->proxy('http', $proxy);
}
# set delay between requests
defined $delay ? $rua->delay($delay) : $rua->delay(1); #default delay is 1 minute
# bypass unless this doc is reachable - testing the HEAD is fastest
my $check_request = new HTTP::Request('HEAD', $source);
# this loops infinitely, sleeping only when the URL cannot be reached, otherwise calls process_page
# routine every $delay seconds
while ( sleep(1000) ){
eval{ $rua->request($check_request) };
next if $@; # next if bad header
# otherwise grab the doc
my $request = new HTTP::Request('GET', $source);
my $response = $rua->request($request);
my $data = $response->content;
process_page( $data );
redo;
}
}
> I need to trap certain LWP exceptions. Is it more appropriate to
> handle the __DIE__ signal or use an eval block on $ua->request($req)?
>
> I'm using LWP and a user agent to repeatedly poll a web page looking
> for updates:
>
> while (sleep 900) {
> $resp = $ua->request($req);
> &process_page( $resp->as_string );
>
> Occasion
ally the script dies with errors like
>
> 500 (Internal Server Error) Can't connect to www.yahoo.com:80
> (Transport endpoint is not connected) Client-Date: Mon, 11 Sep 2000
> 14:43:28 GMT
>
> I think these are server hiccups, going offline for seconds or a few
> minutes, so that the socket fails to connect. The user agent is
> die'ing deep within LWP, maybe LWP::Protocol::http or IO::Socket.
>
> I'd like to trap these exceptions, to go to sleep and try again later,
> instead of die'ing. What's the best way to do that? Shall I handle the
> __DIE__ signal or use an eval block on $ua->request($req)? It's not
> clear to me what the difference is between these or which is more
> appropriate in this instance.
>
> Thx,
> Sandy Morton
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
> Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
> Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
> For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
> Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
>
>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list
mailing list