[tpm] my perl program seg faults

Mon May 14 06:28:02 PDT 2007

On 5/4/07, Fulko Hew <fulko.hew at gmail.com> wrote:
> On 5/4/07, Indy Singh <indy at indigostar.com> wrote:
> > You could try running perl under the debugger.  Then when the process crashes, run the backtrace command to see where it crashed.
>
> OK, so I took this approach:
>
> ulimit -c unlimited      # so that core files are produced
> while ( true ) do ./myprog; done
>
> Then when it died and core dumped, I did:
>
> gdb perl corefile
> bt
>
> then it complains about not having symbols for a lot of things, but
> it did tells me it was in a signalhandler.  (I didn't copy the exact text).
>
> So now I'm trying to figure out what (from a Perl point of view), what
> process is receiving what trap.

OK guys... here's the analysis...

I took the suggestion of looking at what the core dump could provide.
A gdb stack trace told me that it was dying during signal handling.
(Unfortunately I didn't write down the exact routine name, but... So
without attempting to understand and diagnose Perl itself, I looked at
where my code dealt with signals. I narrowed it down to 'death of a
child' in my forking TCP server code.

My server is bsed on the standard skeleton from 'the cookbook'. What
happens in my code is that on occasion, the main listener may choose
to shut itself down and all children (for example, say when it catches
a CTRL-C) The mainline would go through and kill off all forked
children, and then die itself.

What was happening was (or at least _my impression_ was) that the
children would be killed off, but not dead yet. Then the mainline
would die, but the perl interpreter would still be around. Then the
interpreter for the main-line would receive the 'death of a child'
signal for (one or more of) the children, but could no longer handle
it, because the mainline was (just about...!) dead. So it would issue
a segfault and core dump.

My solution/workaround under this situation was to signal the children
to die, and then have the mainline actually wait around (perhaps
forever) for the children to die, and then (and only then) exit/die
itself. I.e.

sub reaper { 1 until (-1 == waitpid(-1, WNOHANG)); }

$SIG{CHLD} = \&reaper;
kill 9, $childPid;        # signal the child to die
reaper();                   # IMPORTANT! ...wait for the child to go away
                               # (else we might get perl seg faulting on exit)
exit;                         # and die ourselves

So in the end, I was seeing this on a number of my apps that have used
the same philosophy on shutdown of forking server apps, and the reason
it was intermittent failure/warnings was all due to the random timing
of the parent/child dying/exiting relationship.

...Sometimes the signal catcher would get invoked... but sometimes the
interpreter seemed to have been shutdown far enough that the catcher
was no longer there when the signal arrived, so it would core dump on
shutdown.