Directory munging [was: Re: Phoenix.pm: Three snippets]

Tim Ayers tayers at bridge.com
Sun May 6 22:22:46 CDT 2001


>>>>> "E" == Eden Li <eden.li at asu.edu> writes:
E> Yes, quite a big one actually:

Well, that really depends on the situation. Let me say, I agree with
you. 'map' in void context is generally a bad idea because map's
purpose is to collect up the results of each iteration into a list and
if you aren't going to use the list, why bother. But your original
blanket statement "Uh oh... Never use map{} in void context." just
irked me.

Back to practical matters, in general you should not use 'map' in void
context, but I do have a bone to pick with your benchmark. I think it
is unfairly biased against map and does not measure what we are
interested in.

E> perl -MBenchmark -e "timethese (10000, {'map' => sub { map
E> {$n=$_}(0..10000) }, 'for' => sub { for (0..10000) {$n=$_}}})"

The map case is generating a list from 0 to 10000 10,000 times. That's
a serious disadvantage from the start. What we are really trying to
measure is the performance hit because map collects the return value
of each call into a list.

I think a more legitimate version of your test would be

 perl -MBenchmark -e "@l=(0..10000); timethese (10000, {'map' => sub { map {$n=$_} @l }, 'for' => sub { for (@l) {$n=$_}}})"

Notice that I have 'for' loop over the same list. Since perl 5.005 (I
think) 'for' has a special case that recognizes (0..10000) as a
numeric iteration and it doesn't actually loop a pointer through a
list. It knows it should decompose to something more like

  for ($_=0; $_<10000; $_++) {}

This will potentially run a lot differently than the list pointer way.

But I think a different benchmark is even more interesting.

  #!/usr/bin/perl -w

  use strict;
  use Benchmark;

  for (100, 1000, 10000) {
    my @l=(0..$_);
    my $n;
    timethese (10000,
    {"map$_" => sub { map {$n=$_} @l },
     "for$_" => sub { for (@l) {$n=$_}}}
    );
  }

$ perl loop.pl
Benchmark: timing 10000 iterations of for100, map100...
    for100:  1 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 15058.82/s (n=10000)
    map100:  1 wallclock secs ( 0.91 usr +  0.00 sys =  0.91 CPU) @ 11034.48/s (n=10000)

'for' wins by some, but not overwhelming. 

Benchmark: timing 10000 iterations of for1000, map1000...
   for1000:  6 wallclock secs ( 6.34 usr +  0.01 sys =  6.34 CPU) @ 1576.35/s (n=10000)
   map1000:  9 wallclock secs ( 8.98 usr +  0.00 sys =  8.98 CPU) @ 1114.01/s (n=10000)

'for' wins by the same ratio as the 100 loop case.

Benchmark: timing 10000 iterations of for10000, map10000...
  for10000: 65 wallclock secs (63.65 usr +  0.02 sys = 63.67 CPU) @ 157.06/s (n=10000)
  map10000: 113 wallclock secs (112.56 usr +  0.03 sys = 112.59 CPU) @ 88.81/s (n=10000)

Okay. Now we are getting closer to Eden's numbers. And it obviously
has to do with collecting up a big list for no reason.

E> Also, as I mentioned before... it's just the wrong construct
E> for plain ol' looping.

Agreed. But I feel your benchmark did not prove anything. Blanket
statements without explanation, followed by misleading benchmarks set
off my alarms. I hope my benchmarks prove your point legitimately. So
everyone, don't use map in void context. ;-) Hopefully I've explained
why a little bit.

I'll end with an almost relevant Larry quote.

  It really doesn't bother me if people want to use grep or map in a
  void context.  It didn't bother me before there was a for modifier,
  and now that there is one, it still doesn't bother me.  I'm just not
  very easy to bother.  
    -- Larry Wall in <199911012346.PAA25557 at kiev.wall.org>

Hope you have a very nice day, :-)
Tim Ayers (tayers at bridge.com), who has now probably made himself out
to be a pedantic SOB amongst his new aquaintances. :-/




More information about the Phoenix-pm mailing list