where to look?

Fri Jul 26 11:34:55 CDT 2002

At 05:17 PM 7/25/2002 -0700, nkuipers wrote:
>Hello all,
>
>Here's the situation.
>
>I work as an analyst for a genomics lab.  We have a dedicated local BLAST
>(http://www.ncbi.nih.gov/) server.  Whether from the summer heat, or a 
>failing
>drive, or an OS bug, or some combination of various factors, our server is
>crapping out too frequently.  This is really annoying because we are often
>hammering it with really large queries that take up to days to complete.  We
>replaced our drive, and currently we simply resort to running our big runs
>overnight when it's cooler and traffic from other labs is light, or 
>constantly
>monitoring the status of the server with a ssh window running top.  But this
>is tedious.  And what if the server dies in the middle of the day when I'm at
>a seminar for 3 hours, and I come back to three hours wasted when a new
>process could have been running (we are pretty quick rebooting during the
>day;)?  What I am therefore interested in is information on finding or 
>writing
>a perl script that does the following:
>
>-monitors the BLAST run once fired up in the ssh terminal
>-if process has bad exit status, reconnect as soon as possible and
>-repeat the call to the original process, with same parameters

Maybe I'm missing something, but it seems that what you need to do is wrap 
code around the code doing the queries that checks for failure and 
restarts.  Why not just:

$done = 0;
until ($done) {
   do_query();
   if (query_succeeded()) { $done = 1 }
}

Peter Scott
peter at psdt.com
http://www.perldebugged.com