[tpm] OID sorting

Abram Hindle abram.hindle at softwareprocess.us
Fri May 1 13:11:39 PDT 2009


Here are my results:

>                    Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
> 
> whilecmp        0.648/s       --            -23%     -81%           -82%         -82%
> henrys_oid_sort 0.843/s      30%              --     -75%           -77%         -77%
> packsort         3.33/s     414%            295%       --            -7%          -9%
> swartzpacksort   3.59/s     454%            326%       8%             --          -2%
> nounpacksort     3.66/s     466%            335%      10%             2%           --

So the while loop comparison, with no transformation was the slowest.
Followed by Henry's OID Tree Sort.
Then my packsort, the swartzpacksort and unpacksort did the best.

The parameters of the test were that 10000 entries were generated with
with a maximum depth of 10 for each entry (e.g. 1.2.3. ... .10). Then we
use benchmark to run for each sort routine this test 100 times. Now per
each test NEW random data was created. So technically a sort could get a
bad roll and get really nasty data to sort, or really easy to sort, but
this happened 100 times.

The while loop comparison sort just uses sort and uses a routine to
compare arrays of numbers. It is just a while loop which chomps on an
array til it finds something not equal. It could probably be sped up.

henrys_oid_sort is the tree sort that henry posted, relatively unmodified.

packsort, swartzpacksort, nounpacksort are sorts where I use pack to
convert the numbers into bigendian representations then use cmp to
compare the values. I use "pack('C*')" in this case because I said the
max value of the integers would be 255. If it was large we'd use another
unsigned big endian representation. These 3 sorts differ in how they are
implemented, if their intermediate steps are assigned to values or if
tuples are created or unpack is used. Currently using tuples seems
faster than unpack.

So henry might pipe up and say his sort could be faster if there was
more depth per entity. Sure. So I reran it again, with 1000 entries, a
maxdepth of 1000, and 10 tests per sort:

>                    Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
> whilecmp        0.671/s       --            -28%     -79%           -81%         -82%
> henrys_oid_sort 0.938/s      40%              --     -71%           -73%         -74%
> packsort         3.20/s     377%            241%       --            -8%         -13%
> swartzpacksort   3.46/s     416%            269%       8%             --          -6%
> nounpacksort     3.67/s     447%            291%      15%             6%           --

Similar results, except that Henry's sort shows about 10% better
performance than while the other sorts have slightly worse performance.

What does this tell us? Yet again relying on C implementations (pack,
regexes, cmp etc.) provides better performance than the more perlish
solutions. Also pack is faster than mucking around with data structures.

Summary: Pack wins, henry's tree sort defeats array cmp.

abram

Uri Guttman wrote:
>>>>>> "HB" == Henry Baragar <Henry.Baragar at instantiated.ca> writes:
> 
>   HB> #! /usr/bin/perl -w
> 
> use warnings is better these days.
>   HB> use strict;
> 
>   HB> sub oid_tree_add {
>   HB>     my ($tree, $oid, $value) = @_;
>   HB>     my @oid = split /\b/, $oid;
> 
> why are you splitting on \b? you could just split on /\./ and get the
> oid numbers. then you can simplify (and speed up) the code below.
> 
>   HB>     _oid_node_add($tree,$value, at oid);
>   HB> }
> 
>   HB> sub _oid_node_add {
>   HB>     my ($tree, $value, $key, @suboid) = @_;
>   HB>     die "Expecting integer, got '$key'" unless $key =~/^\d+/;
>   HB>     my $node = $tree->{$key};
>   HB>     if (my $dot = shift @suboid) {
>   HB> 	die "Expecting '.', got '$dot'" unless $dot eq ".";
>   HB> 	$node->{subtree} = _oid_node_add($node->{subtree}, $value, @suboid);
> 
> i won't get into detail here but if you split on . as i said above, you
> can just delete the check for dot. i can't see any reason to keep that
> logic. 
> 
>   HB>     }
>   HB>     else {
>   HB> 	$node->{value} = $value;
>   HB>     }
>   HB>     $tree->{$key} = $node;
> 
> i don't see $node being modified anywhere. you modify its keys/values
> but $node is still the same hashref. so that line probably could go.
>   HB>     $tree;
> 
> use explicit return statements vs implicit last value returns. this is a
> bug waiting to happen if you ever add/modify code in that section. you
> may forget what was being returned and return some other expression.
> 
>   HB> }
> 
>   HB> sub oid_tree_sorted {
>   HB>     my ($tree) = @_;
>   HB>     my @sorted;
>   HB>     for my $key (sort {$a <=> $b} keys %$tree) {
>   HB> 	my $node = $tree->{$key};
>   HB> 	push @sorted, $node->{value} if $node->{value};
>   HB> 	push @sorted, oid_tree_sorted($node->{subtree})
>   HB> 	    if $node->{subtree};
> 
> if a node has a value it won't have a subtree. so you don't need the
> second test. in general you could have eliminated the need for the
> value/subtree keys and just used the value or hash ref. then you can
> tell if it is a value or tree node with ref(). it is a simpler and
> cleaner data tree design. again, i could be wrong in your details but
> this is a common design idea in perl.
> 
> i leave the actual rewrite as an exercise as i am not up for it at the
> moment. :)
> 
> thanx,
> 
> uri
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/6ac63c05/attachment.bin>


More information about the toronto-pm mailing list