[Melbourne-pm] pmap execution time (was I <3 map & grep too)

Thu Nov 10 21:03:11 PST 2011

On 27/10/11 20:26, Anneli Cuss wrote:
>> And indeed, I picked up this style of programming from Scala, which does do
>> those operations in parallel - in both senses.
>> It can start running the later operations before the early ones have
>> finished - and it can use multiple CPUs to process each stage in parallel.
>
> I misunderstood the point you were making, but I might as well share
> what I've got, as it could serve as a springboard for others. It
> shouldn't be too hard to feed results into subsequent operations early
> (though you might need to ugly up the syntax or use something like
> iterators to make it happen).
>
> I'm sure there's already this on the CPAN (or ten of them), but maybe
> it'll be as fun for others to read as it was for me to write. A simple
> parallel map:
>
> use threads;
> use threads::shared;
>
> our $PARALLEL = 4;
>
> sub pmap (&@) {
>      my $fun = shift;
>      my @args :shared = @_;
>
>      my $each = @args / $PARALLEL;
>      my @threads;
>
>      for ($i = 0; $i<  $PARALLEL; ++$i) {
> 	my $from = $each * $i;
> 	my $to = ($i<  $PARALLEL - 1) ? ($each * ($i + 1) - 1) : $#args;
> 	print STDERR "from $from to $to\n";
>
> 	my ($thr) = threads->create(sub {
> 	    map&$fun, @args[$from..$to]
> 	});
> 	push @threads, $thr;
>      }
>
>      print STDERR "joining\n";
>      my @results;
>      push @results, $_->join for (@threads);
>      @results;
> }
>
> Adjust $PARALLEL to the number of cores your CPU has.
>
> SERVING SUGGESTION:
>
> sub dumb_fib {
>      my $n = shift;
>      $n<= 2 ? 1 : dumb_fib($n-1) + dumb_fib($n-2)
> }
>
> my @a = (39) x 8;
> my @r = pmap { dumb_fib $_ } @a;
> print join ', ', @r;
>
> With $PARALLEL = 8 on a suitable server:
>
> $ time perl pmap.pl
> from 0 to 0
> from 1 to 1
> from 2 to 2
> from 3 to 3
> from 4 to 4
> from 5 to 5
> from 6 to 6
> from 7 to 7
> joining
> 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986
> real	1m20.436s
> user	10m36.575s
> sys	0m0.188s
> $
>
> If I replace 'pmap' with 'map':
>
> $ time perl map.pl
> 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986, 63245986
> real	5m56.631s
> user	5m56.570s
> sys	0m0.006s
> $
>
> So, half the time spent in the CPU (not managing context switches),
> but 5 times longer on the wall!

So, I thought I'd give this a shot in Scala..
The code is:
object pmaptest extends App {
   // This is like my @foo = (39) x 8;
   // Surely there's a better way..
   val foo = (1 to 8).toList.par.map(_ => 39)

   foo map dumb_fib foreach println
   // or you can write:
   // foo.map(dumb_fib).foreach(println)

   def dumb_fib(x: Int) :Int = x match {
     case 1 => 1
     case 2 => 1
     case _ => dumb_fib(x-1) + dumb_fib(x-2)
   }
}

The performance with the parallel maps are:
real	0m0.795s
user	0m2.316s
sys	0m0.012s

If I force the single-threaded version, it is:
real	0m2.222s
user	0m2.220s
sys	0m0.012s

The thing that is embarrassing is that on the same hardware, this is how 
long it takes to run it through Perl:
parallel:
real	1m14.370s
user	4m47.610s
sys	0m0.392s

single thread:
real	4m46.886s
user	4m45.982s
sys	0m0.040s

I expected Scala to be quicker, but that's ridiculous.
That time includes booting the JVM too.
If I benchmark just the actual internal execution time, it comes out at 
around 500 to 600 ms!

Ah, poor old Perl - it just isn't that good when it comes to pure maths.
The only area it won out in was memory usage - about 6M, vs 28M for Scala.