[mplspm]: In a sorted list [better]

Ken Williams ken at mathforum.org
Tue Mar 26 16:44:39 CST 2002


Huh, funny.

Well, it might have something to do with the hash keys.  Numeric hash 
keys (especially sequential ones) notoriously trip up the perl hashing 
algorithm, sometimes making everything go into a single bucket.  I'm not 
sure whether this has been patched in recent perls or not.  Maybe try 
with random words from /usr/dict/words ?

The other thing is that perl maintains a global table of hash keys so 
that it doesn't have to re-hash a key that it's seen elsewhere in a 
different context.

But come to thing of it, maybe neither of these things affects this 
benchmark.  The Conventional Wisdom has been that slicing is faster for 
this, but what does it know?

  -Ken



On Wednesday, March 27, 2002, at 04:46 AM, John J. Trammell wrote:

> On Tue, Mar 26, 2002 at 10:16:27AM +1100, Ken Williams wrote:
>>
>> Since you only care about the keys and not the values, it's generally
>> quicker to construct the hash like so:
>>
>> my $hash = sub
>> {
>>      my ($x,$a) = @_;
>>      my %h;
>>      @h{@$a} = ();
>>      return exists $h{$x};
>> };
>>
>
> The results of the actual experiment (code below) surprised me.  Is
> there something wrong in my test maybe?
>
> #!/usr/bin/perl -w
> use strict;
> use Benchmark;
>
> sub __map
> {
>         my ($a) = @_;
>         my %h = map { $_, 0 } @$a;
>         return \%h;
> }
>
> sub __slice
> {
>         my ($a) = @_;
>         my %h;
>         @h{ @$a } = ();
>         return \%h;
> }
>
> foreach my $size ( 10, 100, 1000, 10_000 )
> {
>         warn "$size-element array\n";
>         my @x = ( 1 .. $size );
>         timethese( 1_000_000, {
>                 map =>   q[__map(\@x)],
>                 slice => q[__slice(\@x)],
>         });
> }
>
> __END__
>
> 10-element array
> Benchmark: timing 1000000 iterations of map, slice...
>        map:  3 wallclock secs ( 3.77 usr +  0.00 sys =  3.77 CPU) @ 
> 265251.99/s (n=1000000)
>      slice:  3 wallclock secs ( 3.94 usr +  0.00 sys =  3.94 CPU) @ 
> 253807.11/s (n=1000000)
> 100-element array
> Benchmark: timing 1000000 iterations of map, slice...
>        map:  3 wallclock secs ( 3.81 usr +  0.00 sys =  3.81 CPU) @ 
> 262467.19/s (n=1000000)
>      slice:  3 wallclock secs ( 3.94 usr +  0.00 sys =  3.94 CPU) @ 
> 253807.11/s (n=1000000)
> 1000-element array
> Benchmark: timing 1000000 iterations of map, slice...
>        map:  3 wallclock secs ( 3.87 usr +  0.00 sys =  3.87 CPU) @ 
> 258397.93/s (n=1000000)
>      slice:  3 wallclock secs ( 3.97 usr +  0.00 sys =  3.97 CPU) @ 
> 251889.17/s (n=1000000)
> 10000-element array
> Benchmark: timing 1000000 iterations of map, slice...
>        map:  3 wallclock secs ( 3.89 usr +  0.00 sys =  3.89 CPU) @ 
> 257069.41/s (n=1000000)
>      slice:  3 wallclock secs ( 4.06 usr +  0.00 sys =  4.06 CPU) @ 
> 246305.42/s (n=1000000)
>
>
>> The real question is whether the original list would be better stored
>> in a hash in the first place, avoiding that conversion each time.  The
>> answer is usually yes.  In that case, it's *way* quicker.
>
> Agreed.
>
> --
> I would like to see an anime where Love conquers all, then goes mad 
> with the
> power and sets up a repressive totalitarian regime that ruthlessly 
> crushes
> and oppresses all non-love.                        -- Frank Raymond 
> Michaels
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo at pm.org
> with "unsubscribe mpls" in the body of the message.
>

  -Ken



--------------------------------------------------
Minneapolis Perl Mongers mailing list

To unsubscribe, send mail to majordomo at pm.org
with "unsubscribe mpls" in the body of the message.



More information about the Mpls-pm mailing list