SPUG: Forked Children talking back to the parent?

Colin Meyer cmeyer at helvella.org
Wed Jan 31 14:53:03 CST 2001


On Wed, Jan 17, 2001 at 09:19:06AM -0800, Richard Anderson wrote:
> It might be simpler to use shared memory and a semaphore as opposed to
> managing 10 sockets.

I have never attempted to use shared memory from Perl, and decided to
give it a go. I started with the IPC::Shareable module, as per Cookbook
recipe 16.2. I realize that hosting considerations ruled out the use of
shared mem for the original poster, but I did this anyway for fun.

I learned a few things.

1. Don't try to share deeply nested data structures. There is a note in
   the IPC::Shareable manpage about this. The module creates a new
   shared memory segment for each nested reference. Something with lots
   of references will quickly use up all available shm segments.

2. You need to be aware of the size of data you are sharing. This could
   be considered a poetic constraint, but it really ran against my perl
   programmers' fur. One of my favorite things about perl is being able
   to use memory willy nilly, with no concerns of malloc, free, or
   buffer overruns.

3. You need to use exclusive locking. This may be a limitation of
   IPC::Shareable, or my lack of understanding of unix shared mem.

4. It is unstable. At least in my experience. My program (pasted below)
   is fairly simple. It greps out 5 letter words from the dictionary,
   turns them into internet domains, and forks off children to look them
   up via gethostbyname(). This program runs for about twenty minutes
   and then segfaults. I have tried this on linux 2.2.14, perl 5.00503,
   perl 5.6.1-trial, perl 5.7.0. Same results always.

I'm not sure why it segfaults. At first I wasn't locking the shared
variable often enough, and it would segfault much earlier.

It could also be the signal handlers causing the segfault instead of
IPC::Shareable, as signal handling has always been slightly unstable
with perl. However, I think that the chance of that is small, as I used
tiny signal routines copied pretty much verbatim from the cookbook.

It would probably be a much better idea to fork $MAXCHILDREN and let
each on look up names repeatedly until all of the work is done, rather
than fork a new child for each and every name. I'll code up a version
like that when I have a few more moments.

Any ideas of how to make this work better would be appreciated.
-C.


=========================8<===================================

#!/usr/local/bin/perl -w

use strict;
use Socket;               # for inet_ntoa
use IPC::Shareable;
use POSIX;
use Data::Dumper;

# this program mostly consists of code from recipes 16.12, 17.11 and 
# 17.12 of the Perl Cookbook.

my $MAXCHILDREN = 5;
my $CHILDREN    = 0;
my %CHILDREN    = ();

my $dict = '/usr/dict/words';

# global vars
my (@domains, %domains, %results);
my @data_share = ();


# signal handlers

# this code comes from recipe 17.11.  I originally tried the snippet
# from 17.12, but that caused zombie children that caused the end
# of the parent to hang forever.
sub REAPER {
  while (my $pid = waitpid(-1, WNOHANG)) {
    last if $pid == -1;
    delete $CHILDREN{$pid};
    $CHILDREN--;
  }
  $SIG{CHLD} = \&REAPER; 
}
$SIG{CHLD} = \&REAPER;

# this handler is meant to be run only by the parent process
sub HUNTER {
  local($SIG{CHLD}) = 'IGNORE';
  kill 'INT', keys %CHILDREN;

  # clean up my shared mem when exitting unexpectedly!
  IPC::Shareable::clean_up(); 
  exit;
}
$SIG{INT} = \&HUNTER;


# get 5 letter words from dictionary
open DICT, "<$dict" or die;
my @dict = grep length == 5, map {chomp;lc} <DICT>;
close DICT;


# @data_share is the shared data repository.  
# Do not store complex datastructures
# in shared memory, because IPC::Shareable creates a new shared segment for
# each nested hashref or arrayref - can easily use up all available shared
# segments.
my $handle = tie @data_share, 'IPC::Shareable', 
                 '5_letter_domain', {create => 1};
# work around for a minor IPC::Shareable bug that doesn't let you 
# check the size of an unused array.
push @data_share, 'dummy';
pop @data_share;


# @domains holds all of the work to be done.  If children could add work
# then @domains could be made shared ala @data_share.  But then it would
# have to be managed for size, so that the shared mem segment won't be
# overrun.
for my $word (@dict) {
  for (qw/_.com www._.com _.org www._.org _.net www._.net/) {
    my $domain = $_;
    $domain =~ s/_/$word/;
    $domains{$word}{$domain} = 1;
  }
}


# start the processing
for my $word (keys %domains) {  
  for my $domain (keys %{$domains{$word}}) {
    status();
  
    while ($CHILDREN >= $MAXCHILDREN) {
      sleep 2;
      harvest();
      status();
    }
  
    # postpone any SIGINT handling until after the fork.
    # this is so that the child can reset its SIGINT handler 
    # before any impending SIGINTs get processed.  Otherwise
    # segfaults will result.
    my $sigset = POSIX::SigSet->new(SIGINT);
    sigprocmask(SIG_BLOCK, $sigset) 
      or die "Couldn't block SIGINT for fork: $!\n";
  
    my $pid; 
    defined ($pid = fork) or die "Couldn't fork: $!";
    
    if ($pid) {                    # parent
      # unblock SIGINT:
      sigprocmask(SIG_UNBLOCK, $sigset)
        or die "Couldn't unblock SIGINT after fork(parent): $!\n";
  
      print STDERR "Just forked child $pid. \n";
      $CHILDREN{$pid} = { word => $word, domain => $domain };
      $CHILDREN++;
    }
  
    else {                         # child
      $SIG{INT} = 'DEFAULT'; # we don't want to kill children 
                             # if we are a child!
      # unblock SIGINT
      sigprocmask(SIG_UNBLOCK, $sigset)
        or die "Couldn't unblock SIGINT after fork(child): $!\n";
  
      print STDERR "I am child $$, working with $domain.\n";
      my $addr = lookup ($domain);

      $handle->shlock;
      push @data_share, [$word, $domain, $addr];
      $handle->shunlock;
  
      exit; # you must never forget the exit, oh best beloved!
    }
  }
}

print STDERR "Parent done doleing out data to kids.\n";
status(), sleep 2 while $CHILDREN;
harvest(); # harvest the remaining results

# clean up shared mem stuff
$handle->remove();
IPC::Shareable::clean_up();

print Data::Dumper::Dumper(\%results);


###############
# support subs

sub status {
  print STDERR "children: $CHILDREN\t\tdata_share: ", 
    scalar @data_share, "\n";
}

# of course it is possible to register a domain without actually 
# assigning ip addresses to any hostnames, but I don't really 
# care about that
sub lookup {
  my $name = shift;
  # I am letting the os time out gethostbyname, rather than harranging 
  # with an alarm and a SIGALRM handler
  my $addr = gethostbyname $name;
  return defined $addr ? inet_ntoa $addr : 'n/a';
}

# harvest any results available on the data_share 
# and store them in global %domains
sub harvest {
  return unless scalar @data_share;
  $handle->shlock;
  $results{$_->[0]}{$_->[1]}=$_->[2] while ($_ = pop @data_share);
  $handle->shunlock;
}

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list