[sf-perl] Sorting vs. hash
frosty
biztos at mac.com
Fri Mar 28 13:22:45 PDT 2008
Sure. See below. I think the sorting trick is fastest.
use strict;
use warnings;
use Benchmark;
# starting point: one big fat array (Neil says 250K).
# but let's assume there are some dupes.
my @input;
my @ltr = ( 'a' .. 'z' );
my $dupe_count = 0;
while ( @input < 250000 ) {
my $t = join( '', map { $ltr[ rand(@ltr) ] } ( 1 .. rand(40) + 10 ) );
push( @input, $t );
# Eliminating dupes is the point.
if ( @input / 20 == int( @input / 20 ) ) {
unshift( @input, $t );
$dupe_count++;
}
}
printf "%d strings (%d dupes)\n", scalar(@input), $dupe_count;
timethese(
100,
{
SortStyle => \&array_strip_without_hash,
SeenStyle => \&array_strip_with_hash,
SimpleHash => \&hash_strip,
SortedHash => \&hash_strip_sort
}
);
sub array_strip_without_hash {
# This strikes me as really dumb, but...
my @new_array;
my $last_item = '';
foreach my $item ( sort @input ) {
next if $item eq $last_item;
$last_item = $item;
push( @new_array, $item );
}
die "oops" unless @new_array == @input - $dupe_count;
}
sub array_strip_with_hash {
# My thinking is this is also dumb.
my @new_array;
my %seen;
foreach my $item (@input) {
next if $seen{$item};
push( @new_array, $item );
$seen{$item} = 1;
}
die "oops" unless @new_array == @input - $dupe_count;
}
sub hash_strip {
my %new_hash = map { $_ => 1 } @input;
my @new_array = keys %new_hash;
die "oops" unless @new_array == @input - $dupe_count;
}
sub hash_strip_sort {
# because it's more useful.
my %new_hash = map { $_ => 1 } @input;
my @new_array = sort keys %new_hash;
die "oops" unless @new_array == @input - $dupe_count;
}
On Friday, March 28, 2008, at 01:16PM, "Neil Heller" <nheller at silcon.com> wrote:
>It's easy to fill an array with 250,0000 or more strings?
>
>Neil Heller
>510-862-4387
>
>
>-----Original Message-----
>From: sanfrancisco-pm-bounces+nheller=silcon.com at pm.org
>[mailto:sanfrancisco-pm-bounces+nheller=silcon.com at pm.org] On Behalf Of Andy
>Lester
>Sent: Friday, March 28, 2008 12:52 PM
>To: San Francisco Perl Mongers User Group
>Subject: Re: [sf-perl] Sorting vs. hash
>
>
>On Mar 28, 2008, at 2:44 PM, Neil Heller wrote:
>> Is it faster to sort the array and then iterate once through the list
>> (writing selected ones to a new array) or write them all into a hash
>> (at
>> which time duplicates would be naturally eliminated)?
>
>
>It's easy enough to test it and not have to speculate idly.
>
>--
>Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance
>
>
>
>
>_______________________________________________
>SanFrancisco-pm mailing list
>SanFrancisco-pm at pm.org
>http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
>_______________________________________________
>SanFrancisco-pm mailing list
>SanFrancisco-pm at pm.org
>http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
>
More information about the SanFrancisco-pm
mailing list