[sf-perl] parallelize system tasks, collecting statuses and output
Kevin Frost
biztos at mac.com
Fri Mar 14 23:39:23 PDT 2008
This sounds like a fun project, but I'm pretty sure there are industry-
standard tools that already solve this problem.
I don't know what they use, but our guys do this all the time on
hundreds of boxes. Just FYI. I'd still do it in perl. :-)
-- frosty
(via iPhone)
On Mar 14, 2008, at 10:12 PM, Michael Friedman <friedman at highwire.stanford.edu
> wrote:
> David,
>
> I think the sequentialism is in your code. You fork() in a loop, and
> each child does its thing, but the parent process -- the one that is
> about to fork the next child -- is sitting there waiting on the
> child's output:
>
> chomp( my $line = <CHILD_READER> );
>
> Until the child gives it something, you've got blocking IO there. It
> won't start the next child until after it's received the output from
> the previous child. To make it actually parallel, you have to not wait
> on output from the child until after you've created all the children.
> You can do that via a double-fork server, some sort of signaling,
> or...
>
> Have you considered using threads? Perl now comes with real live
> working thread support (if you compiled it in). The threads API is
> pretty brilliant, IMO, as it makes it almost as easy as in Java to
> write multi-threaded code.
>
> http://search.cpan.org/~jdhedden/threads-1.69/threads.pm
>
> If you didn't happen to compile in thread support, or are running on
> someone else's copy of perl who didn't, there's a CPAN module that
> emulates the thread API using fork(). It's much slower than real
> threads, but lets you write code that works without threads now and
> with them later when you can update perl.
>
> http://search.cpan.org/~rybskej/forks-0.27/lib/forks.pm
>
> In either case, it explicitly lets you detach from threads and let
> them run at their own pace, gathering the results later. Your driving
> script can only pick up the results in single file, but threads may
> finish in a different order than you started them. I've used this a
> couple of times and it's pretty cool. ;-)
>
> I think the threads API would be a good solution to your problem, if
> it's available to you.
>
> -- Mike
>
>
> On Mar 14, 2008, at 7:38 PM, David Alban wrote:
>
>> greetings,
>>
>> our sysadmins regularly need to restart a service on, say thirty
>> machines. the service takes two and a half minutes to restart. i'd
>> like to write a perl program they can use to parallelize the restart.
>> so that the whole operation takes three minutes rather than an hour
>> or
>> more. i want to collect the statuses and any output of the service
>> restart commands.
>>
>> i found the Bidirectional Communication with Yourself section of the
>> perlipc man page. i'm trying to hack their example so that only the
>> child writer writes to the parent reader. my hacked version forks
>> two
>> child processes, which i want to run in parallel. each child process
>> ssh's to a host, sleeps a small amount of time, and then runs the
>> hostname command. but the child procs seem to be running serially.
>>
>> #!/usr/bin/perl
>>
>> use warnings;
>> use strict;
>>
>> # log_timestamp() below comes from this module
>> use <LOCAL LOGGING MODULE>;
>>
>> use IO::Handle;
>>
>> my @hosts = qw( hostname_1 hostname_2 );
>> my $numprocs = @hosts;
>>
>> my @output;
>>
>> for my $instance ( 1..$numprocs ) {
>> my $index = $instance - 1;
>>
>> pipe( CHILD_READER, PARENT_WRITER )
>> or die "can't pipe( CHILD_READER, PARENT_WRITER ): $!\n";
>>
>> PARENT_WRITER->autoflush( 1 );
>>
>> my $pid;
>>
>> # parent
>> if ( $pid = fork() ) {
>> close PARENT_WRITER;
>>
>> chomp( my $line = <CHILD_READER> );
>> $output[ $index ] = $line;
>>
>> close CHILD_READER;
>> } # if
>>
>> # child
>> else {
>> not defined $pid and die "can't fork: $!\n";
>>
>> close CHILD_READER;
>>
>> my $host = $hosts[ $index ];
>> my @results = qx{ ssh $host "sleep 5; hostname" };
>>
>> print PARENT_WRITER
>> log_timestamp(),
>> " child pid $$; instance $instance; results => ",
>> join( '', @results );
>>
>> close PARENT_WRITER;
>>
>> exit;
>> } # if
>> } # for
>>
>> for my $instance ( 1..$numprocs ) {
>> my $index = $instance - 1;
>> print $output[ $index ], "\n";
>> } # if
>>
>>
>>
>> ---
>> ---
>> ---
>> ---
>> --------------------------------------------------------------------
>>
>> i execute this as:
>>
>> $ date; perl junk; date
>>
>>
>> and get:
>>
>> Sat Mar 15 02:30:22 UTC 2008
>> 2008-03-15 02:30:27 +0000 child pid 5469; instance 1; results =>
>> hostname_1
>> 2008-03-15 02:30:32 +0000 child pid 5471; instance 2; results =>
>> hostname_2
>> Sat Mar 15 02:30:32 UTC 2008
>>
>>
>> it's running the second child process only after the first one
>> finishes, which defeats my goal of parallelizing. what am i missing?
>>
>> there's so much stuff out there that promises to help with this and i
>> don't know whether i'm going down the wrong path. surely you fine
>> folks must have done stuff like this before. do indeed tell me to
>> rtfm, but please tell me which fm (or other doc) to r.
>>
>> please also feel free to tell me i'm going about it totally the wrong
>> way, perhaps with a pointer in the general (better) direction.
>>
>> thanks,
>> david
>> --
>> Live in a world of your own, but always welcome visitors.
>> _______________________________________________
>> SanFrancisco-pm mailing list
>> SanFrancisco-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
> ---------------------------------------------------------------------
> Michael Friedman HighWire Press
> Phone: 650-725-1974 Stanford University
> FAX: 270-721-8034 <friedman at highwire.stanford.edu>
> ---------------------------------------------------------------------
>
>
> _______________________________________________
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
More information about the SanFrancisco-pm
mailing list