[LA.pm] Parallel::Simple

Ofer Nave ofer at netapt.com
Fri Feb 25 21:31:25 PST 2005


Hey,

I just finished coding, testing, and documenting a module that I've 
written for CPAN.  I wanted to get some feedback here on it before I 
upload it.

This is my first CPAN module ever, so please feel free to tear it apart.  
I'll take feedback on anything - design, code style, pod style, doc 
effectiveness, and especially bugs.

I'm not sure what you all prefer, so I'm attaching it as a file, including 
it in the body of the message below, and providing a link to download it:

	http://ofernave.com/pm/Simple.pm

Choose your access method.  :)

After some polishing up, I'll write some tests for it and package it up in 
the standard CPAN module format before uploading it.

Thanks!

-ofer
-------------- next part --------------
package Parallel::Simple;
use strict;

require Exporter;
use vars qw( $VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS );
$VERSION = '0.01';

@ISA       = qw(Exporter);
@EXPORT    = qw();
@EXPORT_OK = qw(prun);

my $error;
my $return_values;

sub prun { run( @_ ) }
sub err { $error }
sub errplus {
    "$error\n" . ( ref($return_values) =~ /HASH/o ?
        join( '', map { "\t$_ => $return_values->{$_}\n" } sort keys %$return_values ) :
        join( '', map { "\t$_ => $return_values->[$_]\n" } 0..$#$return_values )
    );
}
sub rv { $return_values }

sub run {
    my %options = %{shift @_} if ( ref($_[0]) =~ /HASH/o );  # grab options, if specified

    # normalize named and unnamed blocks into similar structure to simplify main loop
    my $named  = ref($_[0]) ? 0 : 1;  # if first element is a subref, they're not named
    my $i      = 0;                   # used to turn array into hash with array-like keys
    my %blocks = $named ? @_ : map { $i++ => $_ } @_;
    my %child_registry;               # pid => [ name, return value ]

    # fork children
    while ( my ( $name, $block ) = each %blocks ) {
        my $child = fork();
        unless ( defined $child ) {
            $error = "$!";
            last;  # something's wrong; stop trying to fork
        }
        if ( $child == 0 ) {  # child
            my $return_value = eval { &$block };
            exit( $@ ? 255 : $options{use_return} ? $return_value : 0 );
        }
        $child_registry{$child} = [ $name, undef ];
    }

    # wait for children to finish
    my $successes = 0;
    my $child;
    do {
        $child = waitpid( -1, 0 );
        if ( $child > 0 and exists $child_registry{$child} ) {
            $child_registry{$child}[1] = $? >> 8;
            $successes++ if ( $? == 0 );
        }
    } while ( $child > 0 );

    # store return values using appropriate data type
    $return_values = $named ?
        { map { $_->[0] => $_->[1] } values %child_registry } :
        [ map { $_->[1] } sort { $a->[0] <=> $b->[0] } values %child_registry ];

    my $num_blocks = keys %blocks;
    return 1 if ( $successes == $num_blocks );  # all good!

    $error = "only $successes of $num_blocks blocks completed successfully";
    return 0;  # sorry... better luck next time
}

1;

__END__

=head1 NAME

Parallel::Simple - the simplest way to run code blocks in parallel

=head1 SYNOPSIS

 use Parallel::Simple qw( prun );

 # Style 1 - Simple List of Code Blocks
 prun(
     sub { print "$$ foo\n" },
     sub { print "$$ bar\n" },
 ) or die( Parallel::errplus );

 # Style 1 with options
 prun(
     { use_return => 1 },
     sub { print "$$ foo\n" },
     sub { print "$$ bar\n" },
 ) or die( Parallel::errplus );

 # Style 2 - Named Code Blocks (like the Benchmark module)
 prun(
     foo => sub { print "$$ foo\n" },
     bar => sub { print "$$ bar\n" },
 ) or die( Parallel::errplus );

=head1 DESCRIPTION

I generally write my scripts in a linear fashion.  Do A, then B, then C.
However, I often have parts that don't depend on each other, and therefore
don't have to run in any particular order, or even linearly.  I could save time
by running them in parallel - but I'm too lazy to deal with forking, and
reaping zombie processes, and other nastiness.

The goal of this module is to make it so mind-numbingly simple to run blocks of
code in parallel that there is no longer any excuse not to do it, and in the
process, drastically cut down the runtimes of many of our applications,
especially when running on multi-processor servers (which are pretty darn
common these days).

Parallel code execution is now as simple as calling B<prun> and passsing it a
list of code blocks to run, followed by testing the return value for truth
using the common "or die" perl idiom.

=head1 EXPORTS

By default, Parallel::Simple does not export any symbols, in which case you
would generally do this:

    use Parallel::Simple;
    Parallel::Simple::run( ... );

You can choose to export the B<prun> subroutine, which is a synonym for B<run>:

    use Parallel::Simple qw(prun);
    prun( ... );

I recommend the second.  It will let you be lazier, thereby increasing the
probability of you taking advantage of this module.  Plus, I think 'prun'
sounds cooler than just 'run'.

=head1 METHODS

All of the following may be called as class methods:

    Parallel::Simple->run()

Or as normal subroutines:

    Parallel::Simple::run()

=over

=item B<prun>

Synonym for B<run>.

=item B<run>

Runs multiple code blocks in parallel by forking a process for each one and
returns when all processes have exited.

=over

=head2 Style 1 - Simple List of Code Blocks

In its simplest form (which is what we're all about here), B<run> takes a list
of code blocks and then forks a process to run each block.  It returns true if
all processes exited with exit value 0, false otherwise.  Example:

    prun(
        sub { print "$$ foo\n" },
        sub { print "$$ bar\n" },
    ) or die( Parallel::errplus );

By default, the exit value will be 255 if the code block dies or throws any
exceptions, or 0 if it doesn't.  You can exercise more control over this by
using the B<use_return> option (documented below) and returning values from
your code block.

If B<run> returns false and you want to see what went wrong, try the B<err>,
B<errplus>, and B<rv> methods documented below - especially B<rv> which will
tell you the exit values of the processes that ran the code blocks.

=back

=over

=head2 Style 2 - Named Code Blocks

Alternatively, you can specify names for all of your code blocks by using the
common "named params" perl idiom.  The only benefit you get from this currently
is an improved lookup method for code block return values (see B<rv> for more
details).  Example:

    prun(
        foo => sub { print "$$ foo\n" },
        bar => sub { print "$$ bar\n" },
    ) or die( Parallel::errplus );

Other than looking nicer, this behaves identical to the Style 1 example.

=back

=over

=head2 Options

You can optionally pass a reference to a hash containing additional options as
the first argument.  Example:

    prun(
        { use_return => 1 },
        sub { print "$$ foo\n" },
        sub { print "$$ bar\n" },
    ) or die( Parallel::errplus );

There is currently only one option:

=over

=item B<use_return>

By default, the return values for the code blocks, which are retrieved using
the B<rv> method, will be 0 if the code block executed normally or 255 if the
code block died or threw any exceptions.  By default, any value the code block
returns is ignored.

If you use the B<use_return> option, then the return value of the code block is
used as the return value (unless the code block dies or throws an exception, in
which case the return value will still be 255).  This value is passed to the
exit function, so please please please use only number between 0 and 255!

=back

=back

=item B<err>

Returns a string describing the last error that occured, or undef if there
has not yet been any errors.

Currently, only two error messages are possible:

=over

=item *

if the call to fork fails, B<err> returns the contents of $!

=item *

if any blocks fail, B<err> returns a message describing how many blocks failed
out of the total

=back

=item B<rv>

Returns different value types depending on whether or not you used named code
blocks:

=over

=item Style 1 (not using named code blocks)

returns a reference to an array containing the return values of the code blocks
in the order they were passed to B<run>

=item Style 2 (using named code blocks)

returns a reference to a hash, where keys are the code block names, and values
are the return values of the respective code block

=back

See the B<use_return> option for the B<run> method for more details on how to
control return values.

=item B<errplus>

Returns a string containing the return value of B<err> plus a nicely formatted
version of the return value of B<rv>.

=back

=head1 PLATFORM SUPPORT

This module was developed and tested on Red Hat Linx 9.0, kernel 2.6.11, and
perl v5.8.4 built for i686-linux-thread-multi.  I have not tested it anywhere
else.  This module is obviously limited to platforms that have a working fork
implementation.

I would appreciate any feedback from people using this module in different
environments, whether it worked or not, so I can note it here.

=head1 FUTURE

The world could probably use a thread-based version of the B<prun> function,
but I have never done threads.

=head1 SEE ALSO

L<Parallel::ForkControl>, L<Parallel::ForkManager>, and L<Parallel::Jobs>
are all similarly themed, and offer different interfaces and advanced features.
I suggest you skim the docs on all three (in addition to mine) before choosing
the right one for you.

=head1 AUTHORS

Written by Ofer Nave E<lt>onave at shopzilla.comE<gt>.
Sponsered by Shopzilla, Inc. (formerly BizRate.com).

=head1 COPYRIGHT

Copyright 2005 by Shopzilla, Inc.

This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

See F<http://www.perl.com/perl/misc/Artistic.html>

=cut


More information about the Losangeles-pm mailing list