[Pdx-pm] Hash question
Tkil
tkil at scrye.com
Mon Nov 24 22:02:26 CST 2003
Regarding various ways to append one hash onto another, I played with
it a bit this afternoon.
Parameters to tune for:
* difficulty of coding
* size of existing hash
* number of elements to add to existing hash
* are auxilary data structures required
* memory efficiency
* time efficiency
Note that I cheated: in all my tests below, number of existing
elements is equal to the number being added. Test program at:
http://scrye.com/~tkil/perl/append-hash.plx
Observations:
The simplest method is hash assignment. It is regularly about half
the speed of the other methods (this might be an artifact of my
"cheat" above, though.) It is easy to code and difficult to get
wrong. Until you know this is your bottleneck, consider sticking
with this.
For sets up to the low thousands (on this hardware, at least),
using array slices of @array as fodder for hash slice assignment
into %dest seems the fastest. This is the "even/odd" approach
described below. For small sets, this can be nearly 40% faster
than any other method examined.
For small sets (less than 100 elements or so), Bruce's
copy-hash-at-once is the best method that doesn't require any
"outside" knowledge.
Larger that, the best self-contained method investigated below is
to index into the source array directly. Some speedups (5-10%) can
be obtained by unrolling the loops.
Summary rankings of simplest and fastest methods:
10 elements 100 elements 1000 elements 10000 elements
---------------- --------------- ---------------- ----------------
h_assign 15085/s h_assign 1485/s h_assign 95.6/s h_assign 4.97/s
duffs_16 22318/s a_index 2326/s h_atonce 148/s h_atonce 7.00/s
a_index 22454/s h_atonce 2425/s a_index 165/s a_index 9.88/s
h_atonce 23247/s unr_16 2571/s duffs_16 179/s even_odd 9.94/s
unr_16 23423/s duffs_16 2594/s unr_16 181/s duffs_16 10.3/s
even_odd 31992/s even_odd 3283/s even_odd 195/s unr_16 10.4/s
Randal, Tom -- did anything ever come out of the p5p discussions to
allow "push HASH, LIST" to do this? Or am I making that up?
Long-winded crap:
To make it a bit more realistic, and to try it out with different
sizes of hashes, I put it into a big loop and looked at 10, 100, 1000,
and 10000-element hashes and arrays:
| for my $n_hash_elts ( 10, 100, 1000, 10000 )
| {
| my @array = map { rand } 1 .. 2*$n_hash_elts;
| my %orig_dest = map { rand } @array;
On each of the sets of benchmarks, I included the previous winner:
| my $h_atonce = sub {
| my %dest = %orig_dest;
| my %tmp = @array;
| @dest{keys %tmp} = values %tmp;
| };
And my first alternate solution, using array indexing to avoid the
cost of copying or modifying @array:
| my $a_index = sub {
| my %dest = %orig_dest;
| for ( my $i = 0; $i < @array; $i += 2 ) {
| $dest{ $array[$i] } = $array[$i+1];
| }
| };
A failed experiment, where I just tried the brute force approach.
(Note that this isn't really a failure; if this is fast enough, by all
means, use it...)
| my $h_assign = sub {
| my %dest = %orig_dest;
| %dest = ( %dest, @array );
| };
Another failed experiment, where I tried to use 'splice' to minimize
the number of changes to the @tmp copy of @array:
| my $a_splice = sub {
| my %dest = %orig_dest;
| my @tmp = @array;
| while (@tmp) {
| my ( $k, $v ) = splice @tmp, 0, 2;
| $dest{$k} = $v;
| }
| };
Some simple unrollings of the array index case, at 8, 16, and 32
elements (I included 24 elements later):
| my $unr_8 = sub {
| my %dest = %orig_dest;
| my $i = 0;
| while ( $i < @array-8 ) {
| $dest{ $array[$i ] } = $array[$i+1];
| $dest{ $array[$i+2 ] } = $array[$i+3];
| $dest{ $array[$i+4 ] } = $array[$i+5];
| $dest{ $array[$i+6 ] } = $array[$i+7];
| $i += 8;
| }
| while ( $i < @array )
| {
| $dest{ $array[$i ] } = $array[$i+1];
| $i += 2;
| }
| };
On the "that makes me feel dirty" scale, how about one that uses
Duff's Device? <http://www.lysator.liu.se/c/duffs-device.html>
| my $duffs_16 = sub {
| my %dest = %orig_dest;
| my $t = @array % 16;
| my $i = $t - 16;
| goto "TARGET_$t";
| while ( $i < @array ) {
| TARGET_0: $dest{ $array[$i ] } = $array[$i+1 ];
| TARGET_14: $dest{ $array[$i+2 ] } = $array[$i+3 ];
| TARGET_12: $dest{ $array[$i+4 ] } = $array[$i+5 ];
| TARGET_10: $dest{ $array[$i+6 ] } = $array[$i+7 ];
| TARGET_8: $dest{ $array[$i+8 ] } = $array[$i+9 ];
| TARGET_6: $dest{ $array[$i+10] } = $array[$i+11];
| TARGET_4: $dest{ $array[$i+12] } = $array[$i+13];
| TARGET_2: $dest{ $array[$i+14] } = $array[$i+15];
| $i += 16;
| }
| };
And, in my final bow to the benchmarking gods:
| my @odd = grep { $_ & 1 } 0 .. $#array;
| my @even = map { $_-1 } @odd;
|
| my $even_odd = sub {
| my %dest = %orig_dest;
| @dest{@array[@even]} = @array[@odd];
| };
There is another implementation that comes to mind, if we can assert
these conditions:
1. Having extra entries in %dest is ok; and
2. The universe of keys is fully distinct from the universe of values.
2. No values are undef (or you are running in "no warnings"):
Then you could do something like:
@dest{ '', @array } = ( @array, '' );
Heh. "Careful with that axe, Eugene!"
Anyway, here are results for various set sizes:
| $ ./append-hash.plx
|
| === 10 elements ===
|
| original methods:
| Rate a_shift h_each a_index h_atonce
| a_shift 9878/s -- -47% -56% -58%
| h_each 18654/s 89% -- -17% -21%
| a_index 22540/s 128% 21% -- -5%
| h_atonce 23685/s 140% 27% 5% --
|
| failed experiments:
| Rate h_assign a_splice a_index h_atonce
| h_assign 15085/s -- -0% -33% -36%
| a_splice 15123/s 0% -- -33% -36%
| a_index 22583/s 50% 49% -- -4%
| h_atonce 23559/s 56% 56% 4% --
|
| unrolled:
| Rate unr_32 a_index unr_8 h_atonce unr_16
| unr_32 21599/s -- -4% -7% -8% -9%
| a_index 22454/s 4% -- -3% -4% -6%
| unr_8 23239/s 8% 3% -- -1% -2%
| h_atonce 23420/s 8% 4% 1% -- -2%
| unr_16 23833/s 10% 6% 3% 2% --
|
| the contenders:
| Rate duffs_16 a_index h_atonce unr_16 even_odd
| duffs_16 22318/s -- -1% -4% -5% -30%
| a_index 22454/s 1% -- -3% -4% -30%
| h_atonce 23247/s 4% 4% -- -1% -27%
| unr_16 23423/s 5% 4% 1% -- -27%
| even_odd 31992/s 43% 42% 38% 37% --
|
| === 100 elements ===
|
| original methods:
| Rate a_shift h_each a_index h_atonce
| a_shift 1018/s -- -48% -56% -59%
| h_each 1942/s 91% -- -17% -22%
| a_index 2329/s 129% 20% -- -6%
| h_atonce 2477/s 143% 28% 6% --
|
| failed experiments:
| Rate h_assign a_splice a_index h_atonce
| h_assign 1485/s -- -4% -36% -38%
| a_splice 1544/s 4% -- -34% -36%
| a_index 2325/s 57% 51% -- -3%
| h_atonce 2401/s 62% 56% 3% --
|
| unrolled:
| Rate a_index h_atonce unr_8 unr_32 unr_16
| a_index 2322/s -- -4% -7% -9% -10%
| h_atonce 2431/s 5% -- -3% -5% -6%
| unr_8 2504/s 8% 3% -- -2% -3%
| unr_32 2546/s 10% 5% 2% -- -1%
| unr_16 2574/s 11% 6% 3% 1% --
|
| the contenders:
| Rate a_index h_atonce unr_16 duffs_16 even_odd
| a_index 2326/s -- -4% -10% -10% -29%
| h_atonce 2425/s 4% -- -6% -7% -26%
| unr_16 2571/s 11% 6% -- -1% -22%
| duffs_16 2594/s 12% 7% 1% -- -21%
| even_odd 3283/s 41% 35% 28% 27% --
|
| === 1000 elements ===
|
| original methods:
| Rate a_shift h_each h_atonce a_index
| a_shift 88.5/s -- -39% -45% -54%
| h_each 145/s 63% -- -10% -24%
| h_atonce 160/s 81% 11% -- -16%
| a_index 191/s 116% 32% 19% --
|
| failed experiments:
| Rate h_assign a_splice h_atonce a_index
| h_assign 95.6/s -- -19% -37% -45%
| a_splice 117/s 23% -- -23% -32%
| h_atonce 153/s 60% 30% -- -12%
| a_index 173/s 81% 48% 13% --
|
| unrolled:
| Rate h_atonce a_index unr_8 unr_32 unr_16
| h_atonce 152/s -- -10% -15% -15% -16%
| a_index 169/s 12% -- -5% -5% -6%
| unr_8 177/s 17% 5% -- -0% -2%
| unr_32 178/s 17% 5% 0% -- -2%
| unr_16 181/s 19% 7% 2% 2% --
|
| the contenders:
| Rate h_atonce a_index duffs_16 unr_16 even_odd
| h_atonce 148/s -- -10% -17% -18% -24%
| a_index 165/s 11% -- -8% -9% -15%
| duffs_16 179/s 21% 9% -- -1% -8%
| unr_16 181/s 22% 10% 1% -- -7%
| even_odd 195/s 32% 18% 9% 8% --
|
| === 10000 elements ===
|
| original methods:
| Rate a_shift h_atonce h_each a_index
| a_shift 6.00/s -- -15% -21% -40%
| h_atonce 7.06/s 18% -- -7% -30%
| h_each 7.60/s 27% 8% -- -24%
| a_index 10.0/s 67% 42% 32% --
|
| failed experiments:
| Rate h_assign h_atonce a_splice a_index
| h_assign 4.97/s -- -29% -31% -50%
| h_atonce 7.03/s 41% -- -2% -29%
| a_splice 7.19/s 45% 2% -- -28%
| a_index 9.94/s 100% 41% 38% --
|
| unrolled:
| Rate h_atonce a_index unr_8 unr_32 unr_16
| h_atonce 7.03/s -- -29% -31% -32% -32%
| a_index 9.90/s 41% -- -3% -4% -5%
| unr_8 10.2/s 45% 3% -- -1% -2%
| unr_32 10.3/s 46% 4% 1% -- -1%
| unr_16 10.4/s 48% 5% 2% 1% --
|
| the contenders:
| Rate h_atonce a_index even_odd duffs_16 unr_16
| h_atonce 7.00/s -- -29% -30% -32% -32%
| a_index 9.88/s 41% -- -1% -4% -5%
| even_odd 9.94/s 42% 1% -- -3% -4%
| duffs_16 10.3/s 47% 4% 4% -- -1%
| unr_16 10.4/s 48% 5% 4% 1% --
For small data sets, the "even odd" approach pretty clearly dominates
the field. The fact that there is some preprocessing involved doesn't
disqualify it in my mind; in a database environment, you are often
fetching exactly the same number of fields each time, so building the
even/odd arrays once is not a problem. And note that these are
indexes into the result arrays, not results themselves, so they're
quite reusable.
The "hash at once" construct does well until the sets get particularly
large. It has an advantage over the "array index" in smaller sets,
and is probably easier to code correctly offhand. "array index"
is nearly as fast as "hash at once" with smaller sets, catching up as
early as 100 elements. As an added bonus, it has the smallest memory
and icache footprint of any of these methods.
Unrolling the array index method does give additional speed, but it is
probably not worth the code bulk. (Hm... the thought of using eval
STRING to generate an unrolled subroutine at run time is tempting.)
The fact that 16 is regularly faster than 8 and 32 is interesting; I
wonder if I'm hitting an icache limitation at the 32. A quick run
with 24 showed it performing about as well as the others
| unrolled:
| Rate h_atonce a_index unr_32 unr_8 unr_24 unr_16
| h_atonce 151/s -- -9% -14% -14% -15% -16%
| a_index 167/s 10% -- -5% -5% -7% -7%
| unr_32 175/s 16% 5% -- -0% -2% -3%
| unr_8 175/s 16% 5% 0% -- -2% -2%
| unr_24 178/s 18% 7% 2% 2% -- -1%
| unr_16 180/s 19% 8% 3% 2% 1% --
Duff's Device is just silly, and provides less and less return as the
set size gets larger.
More information about the Pdx-pm-list
mailing list