[tpm] OID sorting

Wed May 6 11:47:11 PDT 2009

>>>>> "JM" == John Macdonald <john at perlwolf.com> writes:

  JM> On Wed, May 06, 2009 at 12:06:51PM -0400, Fulko Hew wrote:
  >> For Fastest:
  >> 
  >> sub abram_nounpacksort {
  >> my @data = map { [ split(/\./, $_) ] } @_;
  >> my @sorted = map { $_->[1] }
  >> sort { $a->[0] cmp $b->[0] }
  >> map { [pack("N*",@$_),$_] } @data;
  >> @sorted = map { join(".",@$_) } @sorted;
  >> return @sorted;
  >> }

  JM> Since the numbers are the only significant content of the oid,
  JM> I wonder whether it would be cheaper to recreate it at the end.
  JM> Sorting with no sort function is faster than having to do
  JM> 2 array subscripts for each compare. (Maybe that was what
  JM> abram_packsort did?)

that is the whole win over the GRT or packed sort vs the ST. you can
read more in my (very old) paper on this at:

	http://sysarch.com/Perl/sort_paper.html

and the Sort::Maker module has plenty of documentation and a benchmark
script so you can compare sorting styles.

  JM> sub jmm {
  JM>     return
  JM>         map { join( '.', unpack('N*', $_) ) }
  JM>         sort
  JM>         map { pack('N*', split(/\./, $_) ) }
  JM>         @_;
  JM> }

  JM> Instead of a 2-element array, I sometimes just use a single
  JM> string, consisting of a leading portion which is directly
  JM> sortable, if necessary a marker, and finally the original
  JM> string.  The final map just removes the leading string (and
  JM> the marker).

that is probably the fastest style given the input. as someone mentioned
if the oid integers are all less than 255 (unlikely) you can pack into
unsigned bytes and if they are all less than 16 bits you can pack into
short ints (maybe 15 bits as i think pack only does signed shorts). you
need the pack N format if you are not sure as it is 32 bits. another
trick to save the join/map on the return value is to put the original
OID string as is on the end of the compare strings. the issue is how to
get back that original string from the compare string. you can put a
null byte before the string and use rindex and substr or even a
regex. this should be faster than a join but it needs benchmarking. if
the original string is a fixed length than a substr can get it back
easily from the end. this is why knowing your data and any restrictions
is a valuable thing when you want to optimize any algorithm. but those
restrictions have to be 100% correct all the time otherwise this can
blow up.

thanx,

uri

-- 
Uri Guttman  ------  uri at stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------