[sf-perl] timing out wedged ssh processes [was: not understanding program behavior with alarm()]

Sun Feb 5 15:56:55 PST 2012

fyi, the problem i was trying to solve was to time out "wedged" ssh
connections.[1]  what i call a wedged connection is when the ssh
connection is successfully established but hangs.  an interactive
session never gives you a prompt.  a non-interactive sessions never
runs your command(s).

i wanted to set an alarm to timeout the hang.  turns out it works
quite well for wedged ssh invocations, but when i went to add the
alarm() functionality to our library subroutine we use to ssh, and was
testing it, i noticed that the commands issued in non-interactive ssh
sessions that *don't* wedge are not stopped when perl stops a timed
out ssh invocation.

which made me wonder if the same thing was happening with my previous strategy:

our programs do something like:

my $now = time;
my $started_string = "_STARTED=$now";

my $timeout = ... ;    # seconds
my $timeout_string = "_TIMEOUT=$timeout";

my $remote command = ... ;

my $ssh_command = qq{ssh $remote_host "$started_string ;
$timeout_string ; $remote_command"};

... execute ssh command ...

the variable assignments aren't used on the remote host, but they show
up in the process table on the local host, on which a reaper program
is invoked by cron every minute and checks the process table for ssh
process which have both _STARTED and _TIMEOUT in the command.   if the
current time indicates any such process has timed out with respect to
its value for _STARTED and _TIMEOUT, it kills the process (wedged ssh
processes will hang forever).  for instance, say for simplicity that
the remote command is "hostname".  we expect this command to complete
in a second or less, so we may set _TIMEOUT to 20.

i found that with this reaper strategy, as with using alarm(), it
works well on wedged ssh processes, but doesn't stop the remote
command(s) of non-wedged ssh processes.

we decided that if we need for remote commands to timeout, the timeout
functionality must be part of the remote commands.  perhaps alarm() in
them if they're perl.

[1] ssh's -o option argument ConnectTimeout will time out ssh
processes waiting for connections, not "wedged" ones in which a
connection has been made successfully and then hangs.

On Fri, Feb 3, 2012 at 2:43 PM, David Alban <extasia at extasia.org> wrote:
> $ perl junk.test.alarm
> 1
> 2
> 3
> 4
> 5
> timed out!
>
> dalban at srwd00reg008             Fri Feb 03 22:41:37
> ~
> $ 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
>
> with junk.sh continuing until i kill it from the command line.
>
> what am i missing?
>
> should the die() stop junk.sh?

-- 
Live in a world of your own, but always welcome visitors.
***
Rule of law is for the little people.
http://www.amazon.com/Liberty-Justice-Some-Equality-Powerful/dp/0805092056