[tpm] OID sorting

Abram Hindle abram.hindle at softwareprocess.us
Fri May 1 13:15:33 PDT 2009


Here is the code. Sorry.

abram

Abram Hindle wrote:
> Here are my results:
> 
>>                    Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
>>
>> whilecmp        0.648/s       --            -23%     -81%           -82%         -82%
>> henrys_oid_sort 0.843/s      30%              --     -75%           -77%         -77%
>> packsort         3.33/s     414%            295%       --            -7%          -9%
>> swartzpacksort   3.59/s     454%            326%       8%             --          -2%
>> nounpacksort     3.66/s     466%            335%      10%             2%           --
> 
> So the while loop comparison, with no transformation was the slowest.
> Followed by Henry's OID Tree Sort.
> Then my packsort, the swartzpacksort and unpacksort did the best.
> 
> The parameters of the test were that 10000 entries were generated with
> with a maximum depth of 10 for each entry (e.g. 1.2.3. ... .10). Then we
> use benchmark to run for each sort routine this test 100 times. Now per
> each test NEW random data was created. So technically a sort could get a
> bad roll and get really nasty data to sort, or really easy to sort, but
> this happened 100 times.
> 
> The while loop comparison sort just uses sort and uses a routine to
> compare arrays of numbers. It is just a while loop which chomps on an
> array til it finds something not equal. It could probably be sped up.
> 
> henrys_oid_sort is the tree sort that henry posted, relatively unmodified.
> 
> packsort, swartzpacksort, nounpacksort are sorts where I use pack to
> convert the numbers into bigendian representations then use cmp to
> compare the values. I use "pack('C*')" in this case because I said the
> max value of the integers would be 255. If it was large we'd use another
> unsigned big endian representation. These 3 sorts differ in how they are
> implemented, if their intermediate steps are assigned to values or if
> tuples are created or unpack is used. Currently using tuples seems
> faster than unpack.
> 
> So henry might pipe up and say his sort could be faster if there was
> more depth per entity. Sure. So I reran it again, with 1000 entries, a
> maxdepth of 1000, and 10 tests per sort:
> 
>>                    Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
>> whilecmp        0.671/s       --            -28%     -79%           -81%         -82%
>> henrys_oid_sort 0.938/s      40%              --     -71%           -73%         -74%
>> packsort         3.20/s     377%            241%       --            -8%         -13%
>> swartzpacksort   3.46/s     416%            269%       8%             --          -6%
>> nounpacksort     3.67/s     447%            291%      15%             6%           --
> 
> Similar results, except that Henry's sort shows about 10% better
> performance than while the other sorts have slightly worse performance.
> 
> What does this tell us? Yet again relying on C implementations (pack,
> regexes, cmp etc.) provides better performance than the more perlish
> solutions. Also pack is faster than mucking around with data structures.
> 
> Summary: Pack wins, henry's tree sort defeats array cmp.
> 
> abram
> 
> Uri Guttman wrote:
>>>>>>> "HB" == Henry Baragar <Henry.Baragar at instantiated.ca> writes:
>>   HB> #! /usr/bin/perl -w
>>
>> use warnings is better these days.
>>   HB> use strict;
>>
>>   HB> sub oid_tree_add {
>>   HB>     my ($tree, $oid, $value) = @_;
>>   HB>     my @oid = split /\b/, $oid;
>>
>> why are you splitting on \b? you could just split on /\./ and get the
>> oid numbers. then you can simplify (and speed up) the code below.
>>
>>   HB>     _oid_node_add($tree,$value, at oid);
>>   HB> }
>>
>>   HB> sub _oid_node_add {
>>   HB>     my ($tree, $value, $key, @suboid) = @_;
>>   HB>     die "Expecting integer, got '$key'" unless $key =~/^\d+/;
>>   HB>     my $node = $tree->{$key};
>>   HB>     if (my $dot = shift @suboid) {
>>   HB> 	die "Expecting '.', got '$dot'" unless $dot eq ".";
>>   HB> 	$node->{subtree} = _oid_node_add($node->{subtree}, $value, @suboid);
>>
>> i won't get into detail here but if you split on . as i said above, you
>> can just delete the check for dot. i can't see any reason to keep that
>> logic. 
>>
>>   HB>     }
>>   HB>     else {
>>   HB> 	$node->{value} = $value;
>>   HB>     }
>>   HB>     $tree->{$key} = $node;
>>
>> i don't see $node being modified anywhere. you modify its keys/values
>> but $node is still the same hashref. so that line probably could go.
>>   HB>     $tree;
>>
>> use explicit return statements vs implicit last value returns. this is a
>> bug waiting to happen if you ever add/modify code in that section. you
>> may forget what was being returned and return some other expression.
>>
>>   HB> }
>>
>>   HB> sub oid_tree_sorted {
>>   HB>     my ($tree) = @_;
>>   HB>     my @sorted;
>>   HB>     for my $key (sort {$a <=> $b} keys %$tree) {
>>   HB> 	my $node = $tree->{$key};
>>   HB> 	push @sorted, $node->{value} if $node->{value};
>>   HB> 	push @sorted, oid_tree_sorted($node->{subtree})
>>   HB> 	    if $node->{subtree};
>>
>> if a node has a value it won't have a subtree. so you don't need the
>> second test. in general you could have eliminated the need for the
>> value/subtree keys and just used the value or hash ref. then you can
>> tell if it is a value or tree node with ref(). it is a simpler and
>> cleaner data tree design. again, i could be wrong in your details but
>> this is a common design idea in perl.
>>
>> i leave the actual rewrite as an exercise as i am not up for it at the
>> moment. :)
>>
>> thanx,
>>
>> uri
>>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fulko.pl
Type: application/x-perl
Size: 4658 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/36499d64/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/36499d64/attachment-0003.bin>


More information about the toronto-pm mailing list