SPUG: Packed Default Sort

Colin Meyer cmeyer at helvella.org
Wed Sep 20 13:33:33 CDT 2000


Thanks for the talk on map, grep & sort, Richard.  The amount of data 
convolution that can be accomplished with a few of Perl's list functions is 
amazing and fun.

I'd like to call your attention to an interesting form of the schwartz
transform called the packed default sort.  This was developed by Larry Rosler
and presented by him and Uri Guttman at TPC 3.0.  You can read the paper at:
http://www.sysarch.com/perl/sort_paper.html

The meat is that the first map packs the sort key onto the sorting string
itself and then use sort's default asciibetical sorting method instead of a
custom anonymous sub.  After sorting the second map removes the sorting key
from the strings, returning the original strings in a sorted order.

For example, this is a common schwartz for sorting filenames by size:

map $_->[1], sort { $a->[0] <=> $b->[0] } map [-s,$_], @files;

As a packed default sort it might look like:

map substr($_,4), sort map pack("N",-s).$_, @files;

What's the benefit?  Well, calling perl subs is expensive.  When sort uses its
built in asciibetical comparison, it doesn't need to make that expensive 
call to a perl sub.  use Benchmark shows:

Benchmark: running packedd, schwartz, each for at least 10 CPU seconds...
   packedd: 11 wallclock secs ( 7.93 usr +  2.75 sys = 10.68 CPU) @ 37.27/s (n=398)
  schwartz: 11 wallclock secs ( 9.01 usr +  1.49 sys = 10.50 CPU) @ 19.52/s (n=205)

The packed-d also saves you several keystrokes and makes use of the
ever-inscrutable and job-securing pack function.

What's the downside?  It can be difficult to come up with a scheme of 
translating your sort key[s] to something that will sort correctly according
to ascii.  See the paper for some more intricate examples.

Have fun,
-C.

---------
Here's the code to play with:

sub schwartz {
  map $_->[1], sort { $a->[0] <=> $b->[0] } map [-s,$_], @files;
}

sub packedd {
  map substr($_,4), sort map pack("N",-s).$_, @files;
}

@files = </usr/bin/*>;

use Benchmark;

timethese(-10,{
  schwartz => \&schwartz,
  packedd  => \&packedd
});


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list