where to look?

Thu Jul 25 19:17:38 CDT 2002

Hello all,

Here's the situation.

I work as an analyst for a genomics lab.  We have a dedicated local BLAST 
(http://www.ncbi.nih.gov/) server.  Whether from the summer heat, or a failing 
drive, or an OS bug, or some combination of various factors, our server is 
crapping out too frequently.  This is really annoying because we are often 
hammering it with really large queries that take up to days to complete.  We 
replaced our drive, and currently we simply resort to running our big runs 
overnight when it's cooler and traffic from other labs is light, or constantly 
monitoring the status of the server with a ssh window running top.  But this 
is tedious.  And what if the server dies in the middle of the day when I'm at 
a seminar for 3 hours, and I come back to three hours wasted when a new 
process could have been running (we are pretty quick rebooting during the 
day;)?  What I am therefore interested in is information on finding or writing 
a perl script that does the following:

-monitors the BLAST run once fired up in the ssh terminal
-if process has bad exit status, reconnect as soon as possible and
-repeat the call to the original process, with same parameters

I looked at interprocess communication chapter in the Camel 3rd ed, but most 
of what I saw there seemed to deal with sending/handling termination signals 
in case of an error, rather than dealing with a total server shut-down...is 
IO::Socket::INET what I want?

Thanks in advance,

nathanael