SPUG: SUMMARY: sorting hierarchical section numbers

Michael R. Wolf MichaelRWolf at att.net
Thu Aug 17 10:58:17 PDT 2006


I've appended a test file incorporating some of the suggestions I got.  

Thanks to all responders.

See results embedded in code above the __DATA__ tag.

As a test of the test, I tried passing the code in as a code reference and
as a string because PBP mentioned it was better as a string.  That was
contrary to my previous practice, so I thought I'd try it.

Summary:  
1. For a short, already ordered list, pack and split were equal.  I haven't
gotten to test this on long lists, reversed lists, and shuffled lists.

2. The subroutine call overhead is substantial for this code/data
combination, creating a 20% rate decrease. (See previous disclaimer on
reversed, shuffled, and large lists.)

CODE STARTS HERE:...

#! /usr/bin/perl

use warnings;
use strict;

# use Smart::Comments;
use Benchmark qw(:all);

my @sections = <DATA>;
chomp @sections;

foreach (@sections) {
    s/\s//g;
}

# ================
# Sort a "section number" string (i.e. dot-separated list of numbers).

sub by_section_number_splitting {
    # Turn the section string into a list of numbers...
    my @a = split /[.]/, $a;
    my @b = split /[.]/, $b;

    # While there's still something in both lists...
    while (@a && @b) {
	# Grab the first thing in each list...
	my ($lhs, $rhs) = (shift @a, shift @b);

	# and compare them numerically.
	my $rc = $lhs <=> $rhs;

	# If the pair is unequal, we have our result...
	return $rc if $rc;
    }

    # If a list (or both) became empty, the shortest one compares
smallest...
    return @a <=> @b;
}

sub by_section_number_packing {
    # Create a string, packing each number into an unsigned character...
    my $lhs = pack 'C*', split /[.]/, $a;
    my $rhs = pack 'C*', split /[.]/, $b;

    ### lhs: $a, $lhs, length($lhs)
    ### rhs: $b, $rhs, length($rhs)

    # then compare them as strings...
    return $lhs cmp $rhs;
}

my @sorted_by_splitting = sort by_section_number_splitting @sections;
my @sorted_by_packing   = sort by_section_number_packing   @sections;

### original    : @sections
### split sort  : @sorted_by_splitting
### pack sort   : @sorted_by_packing

my $results
  = timethese(0,
	      {
	       # The two sort algorithms as anonymous sub references...
	       'split'     => sub { sort by_section_number_splitting
@sections },
	       'pack'      => sub { sort by_section_number_packing
@sections },

	       # The two sort algorithms as strings to be eval'd...
	       'split_str' => q{    sort by_section_number_splitting
@sections },
	       'pack_str'  => q{    sort by_section_number_packing
@sections },
	      }
	     );

cmpthese($results);

exit;

# ================================================================

=begin comment

Benchmark: running pack, pack_str, split, split_str for at least 3 CPU
seconds...
      pack:  4 wallclock secs ( 3.51 usr +  0.00 sys =  3.51 CPU) @
3889417.00/s (n=13636296)
  pack_str:  4 wallclock secs ( 3.16 usr +  0.00 sys =  3.16 CPU) @
4982422.25/s (n=15764384)
     split:  2 wallclock secs ( 3.28 usr + -0.02 sys =  3.26 CPU) @
3942718.67/s (n=12837492)
 split_str:  3 wallclock secs ( 3.17 usr +  0.00 sys =  3.17 CPU) @
4966724.95/s (n=15764385)
	       Rate      pack     split split_str  pack_str
pack      3889417/s        --       -1%      -22%      -22%
split     3942719/s        1%        --      -21%      -21%
split_str 4966725/s       28%       26%        --       -0%
pack_str  4982422/s       28%       26%        0%        --

=end comment

=cut

__DATA__
1.1
1.11
1.2
1.21
1.3
1.3.1
1.3.1.1
1.3.1.2
1.3.1.3
1.3.1.20
1.3.11.2
1.3.2
1.30.1.2
2
2.1
3.
3.1
11
11.1
11.2
11.3
11.3.1
11.3.1.1
11.3.1.2
11.3.2

-- 
Michael R. Wolf
    All mammals learn by playing!
        MichaelRWolf at att.net

> -----Original Message-----
> From: spug-list-bounces+michaelrwolf=att.net at pm.org [mailto:spug-list-
> bounces+michaelrwolf=att.net at pm.org] On Behalf Of Michael R. Wolf
> Sent: Thursday, August 10, 2006 10:14 PM
> To: 'SPUG Members'
> Subject: SPUG: sorting hierarchical section numbers
> 
> I want to sort strings that represent outline numbers.
> 
> 1
> 1.1
> 1.2
> 1.3
> 1.3.1
> 1.3.2
> 1.3.2.1
> 1.2
> 
> I can't find a module.  Could you point me in the correct direction so
> that
> I can read the fine manual?  I can't seem to find the right search term;
> these are too general 'hierarchy', 'outline', 'number', 'sort'.
> 
> --
> Michael R. Wolf
>     All mammals learn by playing!
>         MichaelRWolf at att.net
> 
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
>      POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
>     MEETINGS: 3rd Tuesdays
>     WEB PAGE: http://seattleperl.org/



More information about the spug-list mailing list