[Pdx-pm] Hash question

Mon Nov 24 22:02:26 CST 2003

Regarding various ways to append one hash onto another, I played with
it a bit this afternoon.

Parameters to tune for:

 * difficulty of coding

 * size of existing hash

 * number of elements to add to existing hash

 * are auxilary data structures required

 * memory efficiency

 * time efficiency

Note that I cheated: in all my tests below, number of existing
elements is equal to the number being added.  Test program at:

   http://scrye.com/~tkil/perl/append-hash.plx

Observations:

   The simplest method is hash assignment.  It is regularly about half
   the speed of the other methods (this might be an artifact of my
   "cheat" above, though.)  It is easy to code and difficult to get
   wrong.  Until you know this is your bottleneck, consider sticking
   with this.

   For sets up to the low thousands (on this hardware, at least),
   using array slices of @array as fodder for hash slice assignment
   into %dest seems the fastest.  This is the "even/odd" approach
   described below.  For small sets, this can be nearly 40% faster
   than any other method examined.

   For small sets (less than 100 elements or so), Bruce's
   copy-hash-at-once is the best method that doesn't require any
   "outside" knowledge.

   Larger that, the best self-contained method investigated below is
   to index into the source array directly.  Some speedups (5-10%) can
   be obtained by unrolling the loops.

Summary rankings of simplest and fastest methods:

   10 elements        100 elements      1000 elements     10000 elements
   ----------------   ---------------   ----------------  ----------------
   h_assign 15085/s   h_assign 1485/s   h_assign  95.6/s  h_assign  4.97/s
   duffs_16 22318/s   a_index  2326/s   h_atonce 148/s    h_atonce  7.00/s
   a_index  22454/s   h_atonce 2425/s   a_index  165/s    a_index   9.88/s
   h_atonce 23247/s   unr_16   2571/s   duffs_16 179/s    even_odd  9.94/s
   unr_16   23423/s   duffs_16 2594/s   unr_16   181/s    duffs_16 10.3/s
   even_odd 31992/s   even_odd 3283/s   even_odd 195/s    unr_16   10.4/s

Randal, Tom -- did anything ever come out of the p5p discussions to
allow "push HASH, LIST" to do this?  Or am I making that up?

Long-winded crap:

To make it a bit more realistic, and to try it out with different
sizes of hashes, I put it into a big loop and looked at 10, 100, 1000,
and 10000-element hashes and arrays:

| for my $n_hash_elts ( 10, 100, 1000, 10000 )
| {
|     my @array     = map { rand } 1 .. 2*$n_hash_elts;
|     my %orig_dest = map { rand } @array;

On each of the sets of benchmarks, I included the previous winner:

|     my $h_atonce = sub {
|         my %dest = %orig_dest;
|         my %tmp = @array;
|         @dest{keys %tmp} = values %tmp;
|     };

And my first alternate solution, using array indexing to avoid the
cost of copying or modifying @array:

|     my $a_index = sub {
|         my %dest = %orig_dest;
|         for ( my $i = 0; $i < @array; $i += 2 ) {
|             $dest{ $array[$i] } = $array[$i+1];
|         }
|     };

A failed experiment, where I just tried the brute force approach.
(Note that this isn't really a failure; if this is fast enough, by all
means, use it...)

|     my $h_assign = sub {
|         my %dest = %orig_dest;
|         %dest = ( %dest, @array );
|     };

Another failed experiment, where I tried to use 'splice' to minimize
the number of changes to the @tmp copy of @array:

|     my $a_splice = sub {
|         my %dest = %orig_dest;
|         my @tmp = @array;
|         while (@tmp) {
|             my ( $k, $v ) = splice @tmp, 0, 2;
|             $dest{$k} = $v;
|         }
|     };

Some simple unrollings of the array index case, at 8, 16, and 32
elements (I included 24 elements later):

|     my $unr_8 = sub {
|         my %dest = %orig_dest;
|         my $i = 0;
|         while ( $i < @array-8 ) {
|             $dest{ $array[$i   ] } = $array[$i+1];
|             $dest{ $array[$i+2 ] } = $array[$i+3];
|             $dest{ $array[$i+4 ] } = $array[$i+5];
|             $dest{ $array[$i+6 ] } = $array[$i+7];
|             $i += 8;
|         }
|         while ( $i < @array )
|         {
|             $dest{ $array[$i   ] } = $array[$i+1];
|             $i += 2;
|         }
|     };

On the "that makes me feel dirty" scale, how about one that uses
Duff's Device?  <http://www.lysator.liu.se/c/duffs-device.html>

|     my $duffs_16 = sub {
|         my %dest = %orig_dest;
|         my $t = @array % 16;
|         my $i = $t - 16;
|         goto "TARGET_$t";
|         while ( $i < @array ) {
|           TARGET_0:  $dest{ $array[$i   ] } = $array[$i+1 ];
|           TARGET_14: $dest{ $array[$i+2 ] } = $array[$i+3 ];
|           TARGET_12: $dest{ $array[$i+4 ] } = $array[$i+5 ];
|           TARGET_10: $dest{ $array[$i+6 ] } = $array[$i+7 ];
|           TARGET_8:  $dest{ $array[$i+8 ] } = $array[$i+9 ];
|           TARGET_6:  $dest{ $array[$i+10] } = $array[$i+11];
|           TARGET_4:  $dest{ $array[$i+12] } = $array[$i+13];
|           TARGET_2:  $dest{ $array[$i+14] } = $array[$i+15];
|             $i += 16;
|         }
|     };

And, in my final bow to the benchmarking gods:

|     my @odd  = grep { $_ & 1 } 0 .. $#array;
|     my @even = map { $_-1 } @odd;
| 
|     my $even_odd = sub {
|         my %dest = %orig_dest;
|         @dest{@array[@even]} = @array[@odd];
|     };

There is another implementation that comes to mind, if we can assert
these conditions:

1. Having extra entries in %dest is ok; and

2. The universe of keys is fully distinct from the universe of values.

2. No values are undef (or you are running in "no warnings"):

Then you could do something like:

   @dest{ '', @array } = ( @array, '' );

Heh.  "Careful with that axe, Eugene!"

Anyway, here are results for various set sizes:

| $ ./append-hash.plx
|
| === 10 elements ===
| 
| original methods:
|             Rate  a_shift   h_each  a_index h_atonce
| a_shift   9878/s       --     -47%     -56%     -58%
| h_each   18654/s      89%       --     -17%     -21%
| a_index  22540/s     128%      21%       --      -5%
| h_atonce 23685/s     140%      27%       5%       --
| 
| failed experiments:
|             Rate h_assign a_splice  a_index h_atonce
| h_assign 15085/s       --      -0%     -33%     -36%
| a_splice 15123/s       0%       --     -33%     -36%
| a_index  22583/s      50%      49%       --      -4%
| h_atonce 23559/s      56%      56%       4%       --
| 
| unrolled:
|             Rate   unr_32  a_index    unr_8 h_atonce   unr_16
| unr_32   21599/s       --      -4%      -7%      -8%      -9%
| a_index  22454/s       4%       --      -3%      -4%      -6%
| unr_8    23239/s       8%       3%       --      -1%      -2%
| h_atonce 23420/s       8%       4%       1%       --      -2%
| unr_16   23833/s      10%       6%       3%       2%       --
| 
| the contenders:
|             Rate duffs_16  a_index h_atonce   unr_16 even_odd
| duffs_16 22318/s       --      -1%      -4%      -5%     -30%
| a_index  22454/s       1%       --      -3%      -4%     -30%
| h_atonce 23247/s       4%       4%       --      -1%     -27%
| unr_16   23423/s       5%       4%       1%       --     -27%
| even_odd 31992/s      43%      42%      38%      37%       --
|
| === 100 elements ===
| 
| original methods:
|            Rate  a_shift   h_each  a_index h_atonce
| a_shift  1018/s       --     -48%     -56%     -59%
| h_each   1942/s      91%       --     -17%     -22%
| a_index  2329/s     129%      20%       --      -6%
| h_atonce 2477/s     143%      28%       6%       --
| 
| failed experiments:
|            Rate h_assign a_splice  a_index h_atonce
| h_assign 1485/s       --      -4%     -36%     -38%
| a_splice 1544/s       4%       --     -34%     -36%
| a_index  2325/s      57%      51%       --      -3%
| h_atonce 2401/s      62%      56%       3%       --
| 
| unrolled:
|            Rate  a_index h_atonce    unr_8   unr_32   unr_16
| a_index  2322/s       --      -4%      -7%      -9%     -10%
| h_atonce 2431/s       5%       --      -3%      -5%      -6%
| unr_8    2504/s       8%       3%       --      -2%      -3%
| unr_32   2546/s      10%       5%       2%       --      -1%
| unr_16   2574/s      11%       6%       3%       1%       --
| 
| the contenders:
|            Rate  a_index h_atonce   unr_16 duffs_16 even_odd
| a_index  2326/s       --      -4%     -10%     -10%     -29%
| h_atonce 2425/s       4%       --      -6%      -7%     -26%
| unr_16   2571/s      11%       6%       --      -1%     -22%
| duffs_16 2594/s      12%       7%       1%       --     -21%
| even_odd 3283/s      41%      35%      28%      27%       --
|
| === 1000 elements ===
| 
| original methods:
|            Rate  a_shift   h_each h_atonce  a_index
| a_shift  88.5/s       --     -39%     -45%     -54%
| h_each    145/s      63%       --     -10%     -24%
| h_atonce  160/s      81%      11%       --     -16%
| a_index   191/s     116%      32%      19%       --
| 
| failed experiments:
|            Rate h_assign a_splice h_atonce  a_index
| h_assign 95.6/s       --     -19%     -37%     -45%
| a_splice  117/s      23%       --     -23%     -32%
| h_atonce  153/s      60%      30%       --     -12%
| a_index   173/s      81%      48%      13%       --
| 
| unrolled:
|           Rate h_atonce  a_index    unr_8   unr_32   unr_16
| h_atonce 152/s       --     -10%     -15%     -15%     -16%
| a_index  169/s      12%       --      -5%      -5%      -6%
| unr_8    177/s      17%       5%       --      -0%      -2%
| unr_32   178/s      17%       5%       0%       --      -2%
| unr_16   181/s      19%       7%       2%       2%       --
| 
| the contenders:
|           Rate h_atonce  a_index duffs_16   unr_16 even_odd
| h_atonce 148/s       --     -10%     -17%     -18%     -24%
| a_index  165/s      11%       --      -8%      -9%     -15%
| duffs_16 179/s      21%       9%       --      -1%      -8%
| unr_16   181/s      22%      10%       1%       --      -7%
| even_odd 195/s      32%      18%       9%       8%       --
|
| === 10000 elements ===
| 
| original methods:
|            Rate  a_shift h_atonce   h_each  a_index
| a_shift  6.00/s       --     -15%     -21%     -40%
| h_atonce 7.06/s      18%       --      -7%     -30%
| h_each   7.60/s      27%       8%       --     -24%
| a_index  10.0/s      67%      42%      32%       --
| 
| failed experiments:
|            Rate h_assign h_atonce a_splice  a_index
| h_assign 4.97/s       --     -29%     -31%     -50%
| h_atonce 7.03/s      41%       --      -2%     -29%
| a_splice 7.19/s      45%       2%       --     -28%
| a_index  9.94/s     100%      41%      38%       --
| 
| unrolled:
|            Rate h_atonce  a_index    unr_8   unr_32   unr_16
| h_atonce 7.03/s       --     -29%     -31%     -32%     -32%
| a_index  9.90/s      41%       --      -3%      -4%      -5%
| unr_8    10.2/s      45%       3%       --      -1%      -2%
| unr_32   10.3/s      46%       4%       1%       --      -1%
| unr_16   10.4/s      48%       5%       2%       1%       --
| 
| the contenders:
|            Rate h_atonce  a_index even_odd duffs_16   unr_16
| h_atonce 7.00/s       --     -29%     -30%     -32%     -32%
| a_index  9.88/s      41%       --      -1%      -4%      -5%
| even_odd 9.94/s      42%       1%       --      -3%      -4%
| duffs_16 10.3/s      47%       4%       4%       --      -1%
| unr_16   10.4/s      48%       5%       4%       1%       --

For small data sets, the "even odd" approach pretty clearly dominates
the field.  The fact that there is some preprocessing involved doesn't
disqualify it in my mind; in a database environment, you are often
fetching exactly the same number of fields each time, so building the
even/odd arrays once is not a problem.  And note that these are
indexes into the result arrays, not results themselves, so they're
quite reusable.

The "hash at once" construct does well until the sets get particularly
large.  It has an advantage over the "array index" in smaller sets,
and is probably easier to code correctly offhand.  "array index"
is nearly as fast as "hash at once" with smaller sets, catching up as
early as 100 elements.  As an added bonus, it has the smallest memory
and icache footprint of any of these methods.

Unrolling the array index method does give additional speed, but it is
probably not worth the code bulk.  (Hm... the thought of using eval
STRING to generate an unrolled subroutine at run time is tempting.)
The fact that 16 is regularly faster than 8 and 32 is interesting; I
wonder if I'm hitting an icache limitation at the 32.  A quick run
with 24 showed it performing about as well as the others

| unrolled:
|           Rate h_atonce  a_index   unr_32    unr_8   unr_24   unr_16
| h_atonce 151/s       --      -9%     -14%     -14%     -15%     -16%
| a_index  167/s      10%       --      -5%      -5%      -7%      -7%
| unr_32   175/s      16%       5%       --      -0%      -2%      -3%
| unr_8    175/s      16%       5%       0%       --      -2%      -2%
| unr_24   178/s      18%       7%       2%       2%       --      -1%
| unr_16   180/s      19%       8%       3%       2%       1%       --

Duff's Device is just silly, and provides less and less return as the
set size gets larger.