[mplspm]: In a sorted list [better]

John J. Trammell trammell at trammell.dyndns.org
Fri Mar 22 10:59:12 CST 2002


On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> And with 2 million strings, building a hash is a non-trivial amount of
> time.

You said it:

#!/usr/bin/perl -w
use strict;
use Benchmark;
$| = 1;

my $binary = sub
{
    use POSIX qw[ floor ];
    my ($x,$a) = @_;
    my ($p,$q) = (0,scalar(@$a)-1);
    LOOP:
    {   
        return 1 if $a->[$p] eq $x;
        return 1 if $a->[$q] eq $x;
        my $mid = floor( ($p+$q)/2 );
        return 0 if ($mid == $p) || ($mid == $q);
        for ($a->[$mid])
        {   
            $x eq $_ && do { return 1; };
            $x lt $_ && do { ($p,$q) = ($p,$mid); last; };
            $x gt $_ && do { ($p,$q) = ($mid,$q); last; };
        }
        redo LOOP;
    }
};

my $hash = sub
{   
    my ($x,$a) = @_;
    my %h = map { $_, 0 } @$a;
    return exists $h{$x};
};

{   
    print "benchmark for size 26\n";
    my @a = ('a' .. 'z');
    my $x = $a[rand(@a)];
    my $y = "__foo__";
   
    timethese(50000,
    {
        binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
        hash =>   sub { $hash->($x,\@a); $hash->($y,\@a); },
    });
}  


{
    print "benchmark for size @{[ 26*26 ]}\n";
    my @a = ('aa' .. 'zz');
    my $x = $a[rand(@a)];
    my $y = "__foo__";

    timethese(50000,
    {
        binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
        hash =>   sub { $hash->($x,\@a); $hash->($y,\@a); },
    });
}

__END__

[ ~ ] perl bar.pl
benchmark for size 26
Benchmark: timing 50000 iterations of binary, hash...
    binary:  4 wallclock secs ( 3.68 usr +  0.00 sys =  3.68 CPU) @ 13586.96/s (n=50000)
    hash:  7 wallclock secs ( 7.21 usr +  0.01 sys =  7.22 CPU) @ 6925.21/s (n=50000)
benchmark for size 676
Benchmark: timing 50000 iterations of binary, hash...
    binary:  7 wallclock secs ( 6.86 usr +  0.00 sys =  6.86 CPU) @ 7288.63/s (n=50000)
    hash: 187 wallclock secs (184.91 usr +  0.06 sys = 184.97 CPU) @ 270.31/s (n=50000)
[ ~ ] ls -l

Admittedly this benchmark isn't perfect, but I think it points
in the direction of the binary search solution scaling better.



--------------------------------------------------
Minneapolis Perl Mongers mailing list

To unsubscribe, send mail to majordomo at pm.org
with "unsubscribe mpls" in the body of the message.



More information about the Mpls-pm mailing list