SPUG: Re: Trapping LWP exceptions

Christopher Cavnor christopher at cavnor.com
Mon Sep 11 12:15:52 CDT 2000


Maybe something like:

while ( sleep(900) ){
    my $resp = eval{$ua->request($req)};
    next if $@;
    process_page( $resp->as_string );
 }

Alternatively, if this is a perpetual function, you might want to use a
Robot User Agent, which can handle it's own timing (default is one minute
per visit) and obeys the robot exclusion standard (as well as exposes your
bot's info for niceness' sake).

Warning: untested code approaching :-)

use LWP::RobotUA;

sub
patch_bot{ 
# pass in args: the URL to fetch, this bot's name, a contact email, 
# the desired time to wait between requests, a proxy adress (if one exists)
my ($source, $botname, $email, $delay, $proxy) = @_;

# Create a Robot User Agent object, give it a name
my $rua = LWP::RobotUA->new($botname, $email);

# If proxy server specified, define it in the User Agent object
    if (defined $proxy) {
        $rua->proxy('http', $proxy);
    }

# set delay between requests 
defined $delay ? $rua->delay($delay) : $rua->delay(1);     #default delay is 1 minute

# bypass unless this doc is reachable - testing the HEAD is fastest
my $check_request = new HTTP::Request('HEAD', $source);

# this loops infinitely, sleeping only when the URL cannot be reached, otherwise calls process_page
# routine every $delay seconds
    while ( sleep(1000) ){
        eval{ $rua->request($check_request) };
        next if $@; # next if bad header
        # otherwise grab the doc
        my $request = new HTTP::Request('GET', $source); 
        my $response = $rua->request($request);
        my $data = $response->content;
        process_page( $data );
        redo;
     }
}






> I need to trap certain LWP exceptions. Is it more appropriate to
> handle the __DIE__ signal or use an eval block on $ua->request($req)?
> 
> I'm using LWP and a user agent to repeatedly poll a web page looking
> for updates:
> 
>     while (sleep 900) {
>       $resp = $ua->request($req);
>       &process_page( $resp->as_string );  
> 
> Occasion
ally the script dies with errors like
>
>    500 (Internal Server Error) Can't connect to www.yahoo.com:80
>    (Transport endpoint is not connected) Client-Date: Mon, 11 Sep 2000
>    14:43:28 GMT
>
> I think these are server hiccups, going offline for seconds or a few
> minutes, so that the socket fails to connect. The user agent is
> die'ing deep within LWP, maybe LWP::Protocol::http or IO::Socket.
>
> I'd like to trap these exceptions, to go to sleep and try again later,
> instead of die'ing. What's the best way to do that? Shall I handle the
> __DIE__ signal or use an eval block on $ua->request($req)? It's not
> clear to me what the difference is between these or which is more
> appropriate in this instance.
>
> Thx,
> Sandy Morton
>
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
>  For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
>
>


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list