[sf-perl] parallelize system tasks, collecting statuses and output

Fri Mar 14 23:39:23 PDT 2008

This sounds like a fun project, but I'm pretty sure there are industry- 
standard tools that already solve this problem.

I don't know what they use, but our guys do this all the time on  
hundreds of boxes.  Just FYI.  I'd still do it in perl. :-)

  -- frosty

(via iPhone)

On Mar 14, 2008, at 10:12 PM, Michael Friedman <friedman at highwire.stanford.edu 
 > wrote:

> David,
>
> I think the sequentialism is in your code. You fork() in a loop, and
> each child does its thing, but the parent process -- the one that is
> about to fork the next child -- is sitting there waiting on the
> child's output:
>
>    chomp( my $line = <CHILD_READER> );
>
> Until the child gives it something, you've got blocking IO there. It
> won't start the next child until after it's received the output from
> the previous child. To make it actually parallel, you have to not wait
> on output from the child until after you've created all the children.
> You can do that via a double-fork server, some sort of signaling,  
> or...
>
> Have you considered using threads? Perl now comes with real live
> working thread support (if you compiled it in). The threads API is
> pretty brilliant, IMO, as it makes it almost as easy as in Java to
> write multi-threaded code.
>
> http://search.cpan.org/~jdhedden/threads-1.69/threads.pm
>
> If you didn't happen to compile in thread support, or are running on
> someone else's copy of perl who didn't, there's a CPAN module that
> emulates the thread API using fork(). It's much slower than real
> threads, but lets you write code that works without threads now and
> with them later when you can update perl.
>
> http://search.cpan.org/~rybskej/forks-0.27/lib/forks.pm
>
> In either case, it explicitly lets you detach from threads and let
> them run at their own pace, gathering the results later. Your driving
> script can only pick up the results in single file, but threads may
> finish in a different order than you started them. I've used this a
> couple of times and it's pretty cool. ;-)
>
> I think the threads API would be a good solution to your problem, if
> it's available to you.
>
> -- Mike
>
>
> On Mar 14, 2008, at 7:38 PM, David Alban wrote:
>
>> greetings,
>>
>> our sysadmins regularly need to restart a service on, say thirty
>> machines.  the service takes two and a half minutes to restart.  i'd
>> like to write a perl program they can use to parallelize the restart.
>> so that the whole operation takes three minutes rather than an hour  
>> or
>> more.  i want to collect the statuses and any output of the service
>> restart commands.
>>
>> i found the Bidirectional Communication with Yourself section of the
>> perlipc man page.  i'm trying to hack their example so that only the
>> child writer writes to the parent reader.  my hacked version forks  
>> two
>> child processes, which i want to run in parallel.  each child process
>> ssh's to a host, sleeps a small amount of time, and then runs the
>> hostname command.  but the child procs seem to be running serially.
>>
>> #!/usr/bin/perl
>>
>> use warnings;
>> use strict;
>>
>>    # log_timestamp() below comes from this module
>> use <LOCAL LOGGING MODULE>;
>>
>> use IO::Handle;
>>
>> my @hosts = qw( hostname_1 hostname_2 );
>> my $numprocs = @hosts;
>>
>> my @output;
>>
>> for my $instance ( 1..$numprocs ) {
>> my $index = $instance - 1;
>>
>> pipe( CHILD_READER, PARENT_WRITER )
>>   or die "can't pipe( CHILD_READER, PARENT_WRITER ): $!\n";
>>
>> PARENT_WRITER->autoflush( 1 );
>>
>> my $pid;
>>
>>      # parent
>> if ( $pid = fork() ) {
>>   close PARENT_WRITER;
>>
>>   chomp( my $line = <CHILD_READER> );
>>   $output[ $index ] = $line;
>>
>>   close CHILD_READER;
>> } # if
>>
>>      # child
>> else {
>>   not defined $pid and die "can't fork: $!\n";
>>
>>   close CHILD_READER;
>>
>>   my $host = $hosts[ $index ];
>>   my @results = qx{ ssh $host "sleep 5; hostname" };
>>
>>   print PARENT_WRITER
>>         log_timestamp(),
>>         " child pid $$; instance $instance; results => ",
>>         join( '', @results );
>>
>>   close PARENT_WRITER;
>>
>>   exit;
>> } # if
>> } # for
>>
>> for my $instance ( 1..$numprocs ) {
>> my $index = $instance - 1;
>> print $output[ $index ], "\n";
>> } # if
>>
>>
>>
>> --- 
>> --- 
>> --- 
>> --- 
>> --------------------------------------------------------------------
>>
>> i execute this as:
>>
>> $ date; perl junk; date
>>
>>
>> and get:
>>
>> Sat Mar 15 02:30:22 UTC 2008
>> 2008-03-15 02:30:27 +0000 child pid 5469; instance 1; results =>
>> hostname_1
>> 2008-03-15 02:30:32 +0000 child pid 5471; instance 2; results =>
>> hostname_2
>> Sat Mar 15 02:30:32 UTC 2008
>>
>>
>> it's running the second child process only after the first one
>> finishes, which defeats my goal of parallelizing.  what am i missing?
>>
>> there's so much stuff out there that promises to help with this and i
>> don't know whether i'm going down the wrong path.  surely you fine
>> folks must have done stuff like this before.  do indeed tell me to
>> rtfm, but please tell me which fm (or other doc) to r.
>>
>> please also feel free to tell me i'm going about it totally the wrong
>> way, perhaps with a pointer in the general (better) direction.
>>
>> thanks,
>> david
>> -- 
>> Live in a world of your own, but always welcome visitors.
>> _______________________________________________
>> SanFrancisco-pm mailing list
>> SanFrancisco-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
> ---------------------------------------------------------------------
> Michael Friedman                     HighWire Press
> Phone: 650-725-1974                  Stanford University
> FAX:   270-721-8034                  <friedman at highwire.stanford.edu>
> ---------------------------------------------------------------------
>
>
> _______________________________________________
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm