[Melbourne-pm] pmap execution time (was I <3 map & grep too)
Toby Corkindale
toby.corkindale at strategicdata.com.au
Thu Nov 10 21:03:11 PST 2011
On 27/10/11 20:26, Anneli Cuss wrote:
>> And indeed, I picked up this style of programming from Scala, which does do
>> those operations in parallel - in both senses.
>> It can start running the later operations before the early ones have
>> finished - and it can use multiple CPUs to process each stage in parallel.
>
> I misunderstood the point you were making, but I might as well share
> what I've got, as it could serve as a springboard for others. It
> shouldn't be too hard to feed results into subsequent operations early
> (though you might need to ugly up the syntax or use something like
> iterators to make it happen).
>
> I'm sure there's already this on the CPAN (or ten of them), but maybe
> it'll be as fun for others to read as it was for me to write. A simple
> parallel map:
>
> use threads;
> use threads::shared;
>
> our $PARALLEL = 4;
>
> sub pmap (&@) {
> my $fun = shift;
> my @args :shared = @_;
>
> my $each = @args / $PARALLEL;
> my @threads;
>
> for ($i = 0; $i< $PARALLEL; ++$i) {
> my $from = $each * $i;
> my $to = ($i< $PARALLEL - 1) ? ($each * ($i + 1) - 1) : $#args;
> print STDERR "from $from to $to\n";
>
> my ($thr) = threads->create(sub {
> map&$fun, @args[$from..$to]
> });
> push @threads, $thr;
> }
>
> print STDERR "joining\n";
> my @results;
> push @results, $_->join for (@threads);
> @results;
> }
>
> Adjust $PARALLEL to the number of cores your CPU has.
>
> SERVING SUGGESTION:
>
> sub dumb_fib {
> my $n = shift;
> $n<= 2 ? 1 : dumb_fib($n-1) + dumb_fib($n-2)
> }
>
> my @a = (39) x 8;
> my @r = pmap { dumb_fib $_ } @a;
> print join ', ', @r;
>
> With $PARALLEL = 8 on a suitable server:
>
> $ time perl pmap.pl
> from 0 to 0
> from 1 to 1
> from 2 to 2
> from 3 to 3
> from 4 to 4
> from 5 to 5
> from 6 to 6
> from 7 to 7
> joining
> 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986
> real 1m20.436s
> user 10m36.575s
> sys 0m0.188s
> $
>
> If I replace 'pmap' with 'map':
>
> $ time perl map.pl
> 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986
> real 5m56.631s
> user 5m56.570s
> sys 0m0.006s
> $
>
> So, half the time spent in the CPU (not managing context switches),
> but 5 times longer on the wall!
So, I thought I'd give this a shot in Scala..
The code is:
object pmaptest extends App {
// This is like my @foo = (39) x 8;
// Surely there's a better way..
val foo = (1 to 8).toList.par.map(_ => 39)
foo map dumb_fib foreach println
// or you can write:
// foo.map(dumb_fib).foreach(println)
def dumb_fib(x: Int) :Int = x match {
case 1 => 1
case 2 => 1
case _ => dumb_fib(x-1) + dumb_fib(x-2)
}
}
The performance with the parallel maps are:
real 0m0.795s
user 0m2.316s
sys 0m0.012s
If I force the single-threaded version, it is:
real 0m2.222s
user 0m2.220s
sys 0m0.012s
The thing that is embarrassing is that on the same hardware, this is how
long it takes to run it through Perl:
parallel:
real 1m14.370s
user 4m47.610s
sys 0m0.392s
single thread:
real 4m46.886s
user 4m45.982s
sys 0m0.040s
I expected Scala to be quicker, but that's ridiculous.
That time includes booting the JVM too.
If I benchmark just the actual internal execution time, it comes out at
around 500 to 600 ms!
Ah, poor old Perl - it just isn't that good when it comes to pure maths.
The only area it won out in was memory usage - about 6M, vs 28M for Scala.
More information about the Melbourne-pm
mailing list