[tpm] OID sorting
Abram Hindle
abram.hindle at softwareprocess.us
Fri May 1 13:15:33 PDT 2009
Here is the code. Sorry.
abram
Abram Hindle wrote:
> Here are my results:
>
>> Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
>>
>> whilecmp 0.648/s -- -23% -81% -82% -82%
>> henrys_oid_sort 0.843/s 30% -- -75% -77% -77%
>> packsort 3.33/s 414% 295% -- -7% -9%
>> swartzpacksort 3.59/s 454% 326% 8% -- -2%
>> nounpacksort 3.66/s 466% 335% 10% 2% --
>
> So the while loop comparison, with no transformation was the slowest.
> Followed by Henry's OID Tree Sort.
> Then my packsort, the swartzpacksort and unpacksort did the best.
>
> The parameters of the test were that 10000 entries were generated with
> with a maximum depth of 10 for each entry (e.g. 1.2.3. ... .10). Then we
> use benchmark to run for each sort routine this test 100 times. Now per
> each test NEW random data was created. So technically a sort could get a
> bad roll and get really nasty data to sort, or really easy to sort, but
> this happened 100 times.
>
> The while loop comparison sort just uses sort and uses a routine to
> compare arrays of numbers. It is just a while loop which chomps on an
> array til it finds something not equal. It could probably be sped up.
>
> henrys_oid_sort is the tree sort that henry posted, relatively unmodified.
>
> packsort, swartzpacksort, nounpacksort are sorts where I use pack to
> convert the numbers into bigendian representations then use cmp to
> compare the values. I use "pack('C*')" in this case because I said the
> max value of the integers would be 255. If it was large we'd use another
> unsigned big endian representation. These 3 sorts differ in how they are
> implemented, if their intermediate steps are assigned to values or if
> tuples are created or unpack is used. Currently using tuples seems
> faster than unpack.
>
> So henry might pipe up and say his sort could be faster if there was
> more depth per entity. Sure. So I reran it again, with 1000 entries, a
> maxdepth of 1000, and 10 tests per sort:
>
>> Rate whilecmp henrys_oid_sort packsort swartzpacksort nounpacksort
>> whilecmp 0.671/s -- -28% -79% -81% -82%
>> henrys_oid_sort 0.938/s 40% -- -71% -73% -74%
>> packsort 3.20/s 377% 241% -- -8% -13%
>> swartzpacksort 3.46/s 416% 269% 8% -- -6%
>> nounpacksort 3.67/s 447% 291% 15% 6% --
>
> Similar results, except that Henry's sort shows about 10% better
> performance than while the other sorts have slightly worse performance.
>
> What does this tell us? Yet again relying on C implementations (pack,
> regexes, cmp etc.) provides better performance than the more perlish
> solutions. Also pack is faster than mucking around with data structures.
>
> Summary: Pack wins, henry's tree sort defeats array cmp.
>
> abram
>
> Uri Guttman wrote:
>>>>>>> "HB" == Henry Baragar <Henry.Baragar at instantiated.ca> writes:
>> HB> #! /usr/bin/perl -w
>>
>> use warnings is better these days.
>> HB> use strict;
>>
>> HB> sub oid_tree_add {
>> HB> my ($tree, $oid, $value) = @_;
>> HB> my @oid = split /\b/, $oid;
>>
>> why are you splitting on \b? you could just split on /\./ and get the
>> oid numbers. then you can simplify (and speed up) the code below.
>>
>> HB> _oid_node_add($tree,$value, at oid);
>> HB> }
>>
>> HB> sub _oid_node_add {
>> HB> my ($tree, $value, $key, @suboid) = @_;
>> HB> die "Expecting integer, got '$key'" unless $key =~/^\d+/;
>> HB> my $node = $tree->{$key};
>> HB> if (my $dot = shift @suboid) {
>> HB> die "Expecting '.', got '$dot'" unless $dot eq ".";
>> HB> $node->{subtree} = _oid_node_add($node->{subtree}, $value, @suboid);
>>
>> i won't get into detail here but if you split on . as i said above, you
>> can just delete the check for dot. i can't see any reason to keep that
>> logic.
>>
>> HB> }
>> HB> else {
>> HB> $node->{value} = $value;
>> HB> }
>> HB> $tree->{$key} = $node;
>>
>> i don't see $node being modified anywhere. you modify its keys/values
>> but $node is still the same hashref. so that line probably could go.
>> HB> $tree;
>>
>> use explicit return statements vs implicit last value returns. this is a
>> bug waiting to happen if you ever add/modify code in that section. you
>> may forget what was being returned and return some other expression.
>>
>> HB> }
>>
>> HB> sub oid_tree_sorted {
>> HB> my ($tree) = @_;
>> HB> my @sorted;
>> HB> for my $key (sort {$a <=> $b} keys %$tree) {
>> HB> my $node = $tree->{$key};
>> HB> push @sorted, $node->{value} if $node->{value};
>> HB> push @sorted, oid_tree_sorted($node->{subtree})
>> HB> if $node->{subtree};
>>
>> if a node has a value it won't have a subtree. so you don't need the
>> second test. in general you could have eliminated the need for the
>> value/subtree keys and just used the value or hash ref. then you can
>> tell if it is a value or tree node with ref(). it is a simpler and
>> cleaner data tree design. again, i could be wrong in your details but
>> this is a common design idea in perl.
>>
>> i leave the actual rewrite as an exercise as i am not up for it at the
>> moment. :)
>>
>> thanx,
>>
>> uri
>>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fulko.pl
Type: application/x-perl
Size: 4658 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/36499d64/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/36499d64/attachment-0003.bin>
More information about the toronto-pm
mailing list