[Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the Week #18

Thu Jul 1 10:44:42 CDT 2004

=====
http://www.brettsbsd.net/~estrabd

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

----- Original message -----
From: "Jon Bjornstad" <jon at icogitate.com>
To: perl-qotw at plover.com
Date: Mon, 28 Jun 2004 23:28:11 -0700
Subject: Solutions and Discussion for Perl Quiz of the Week #18

Sample solutions and discussion
Perl Quiz of The Week #18 (20040623)

[Note: This was quiz #18, not #16.  I sent out the quiz with the wrong
Subject line.  -Mark Jason Dominus]

        For statewide elections, the state of California uses a "random
        alphabetic" sort to order the candidates' names on the ballot.
        This link describes that special sort:

                http://www.ss.ca.gov/elections/elections_ra.htm

        (mirrored at
        http://perl.plover.com/qotw/misc/r018/elections_ra.htm )

        The idea is this: a random order is selected for the twenty-six
        letters of the alphabet, and then the names on the ballot are
        sorted according to this random order, instead of according to
        the usual order.

        This is the order in which the names appear on the ballot in
        Assembly District 1.  In Assembly District 2, the order is the
        same, except that the first name on the list has been moved to
        the end of the list.  In Assembly District 3, the next name on
        the list is moved to the end of the list, and so on.

        You will write a program to generate ballots.  The program will
        be invoked like this:

                elect.pl [-r] [permutation] [district-number] [namefile]

        (Items in [square brackets] are optional.)

        The 'permuation' argument will default to 'randlet.txt'.  If the
        file exists, it should contain a single line of text with a
        random permutation of the 26 letters of the alphabet. If this
        file does not exist, the program should generate one at random.
        The -r option will force the file to be regenerated even if it
        exists already.

        The 'district-number' is the number of the Assembly District for
        which the ballot will be generated.  If omitted, the program
        should generate the ballot for Assembly District 1.

        The 'namefile' is a file that contains the names of the
        candidates.  The names come one per line with first name then
        last name:

                Jill Harmon
                Gus Tribble
                Walter Reston
                Norma Kretschmer
                Fiorella Squamuglia
                Vern Smith
                Bill Smith
                Ed Squid
                John Macdonald
                Angus MacDonald

        If the namefile is omitted, the program should read the names
        from the standard input.

        The program should print out the candidates' names, one per
        line, in the order specified by California state law, sorted
        according to the random permutation of the alphabet, with names
        rearranged as appropriate for the specified Assembly District.

        For example, if the file 'permutation' contains 

                QWERTYUIOPASDFGHJKLZXCVBNM

        and the 'namefile' contains the names listed above, then the
        output for Assembly District 1 should be:

                Walter Reston
                Gus Tribble
                Ed Squid
                Fiorella Squamuglia
                Vern Smith
                Bill Smith
                Jill Harmon
                Norma Kretschmer
                Angus MacDonald
                John Macdonald

        For Assembly District 4, it should be:

                Fiorella Squamuglia
                Vern Smith
                Bill Smith
                Jill Harmon
                Norma Kretschmer
                Angus MacDonald
                John Macdonald
                Walter Reston
                Gus Tribble
                Ed Squid

----------------------------------------------------------------

When I first encountered the 'random alphabetic ordering'
method of the California election system I immediately
thought it would make for an interesting Perl challenge
and it has.

These people submitted ideas/solutions (in no
particular order :):

	Matthew Walton
	Christer Ekholm
	John J. Trammell
	Yitzchak Scott-Thoennes
	Randy W. Sims
	Adrian Scottow
	Rod Adams
	Kester Allen
	Jonathan Scott Duff
	Mark Jason Dominus
	Torsten Hofmann
	the hatter
	Patrick LeBoutillier
	Rick Measham
	Mark Morgan

There was first a discussion to refine the specification.
Command line arguments and how to separate first and last
names and what about non-alphabetic characters in names.
[Some of this was my fault.  Jon's original request was for
a program whose input was in a different format than what I
sent out; I rewrote it considerably.  I was aware that the
argument format I inserted was ambiguous, but didn't think
it would be a big problem.  -MJD] I tried to short circuit
some of this discussion as the point of the quiz was to have
some fun with sorting techniques.  Because of this I won't
discuss the various ways of using command line options to
get the required data into the program.  I'll focus on the
sort itself.

Many people saw right away that since comparing two
names would be a complex affair that this was a prime
opportunity for using a Schwartzian Transform.

This is a technique invented by and named after (not by)
Randall Schwartz, author of the classic
"Learning Perl".  If any of you do not know of this
technique you can read about it in several places (Perl Cookbook,
Effective Perl, perldoc -f sort, etc).

The key idea in a complex sort is to pre-generate a sort key
for each item BEFORE you do the sort rather than each time
the sort compares two elements.   This is much more efficient and
simply darn clever!    The use of anonymous arrays
and maps unwind the pregenerated sort keys from the desired
result:

	# schwartzian transform for efficiency and fun
	@names =
		map  { $_->[0] }
		sort { $a->[1] cmp $b->[1] }
		map  { [ $_, sortkey $_ ] }
		@names;

How to generate a sort key in this case?

The tr/// function comes in quite handy here.
If the permuted alphabet in in $alphabet then this will do it:

        sub sortkey {
                my $name = shift;
                $name =~ tr/$alphabet/A-Z/;
                return $name;
        }

After this transformation the name we can simply do
a simple compare and the result will be to have respected
the new alphabetic ordering.

Unfortunately tr/// does not do variable interpolation (why?).
So we need to wrap the tr/// in an eval.

There is also the need to swap first and last names
and do the sort case-insensitively.

So here is what Rick Measham coded:

        sub encode {
             # Case insensitive
             my $switch = uc shift;

             my $permutation = shift;

             # Last name first
             $switch =~ s/^([\w-]+)\s+(.+)$/$2 $1/;

             # Switch in the new alphabet -- tr doesn't take
             # variables so we have to eval the
             # transliteration. We checked the permutation
             # before so we can trust it. It's just a list
             # of letters.
             eval "\$switch =~
             tr/$permutation/ABCDEFGHIJKLMNOPQRSTUVWXYZ/";

             return $switch;
        }

        # Schwartzian Transform -- see note below
        my @sorted = map { $_->[1] }
             sort { $a->[0] cmp $b->[0] }
             map { [ encode($_, $permutation), $_ ] } @names;

That pretty much summarizes what I think is the optimal approach.
One improvement suggested by Mark Jason Dominus was
to only do ONE eval rather than one for each name in the list.

Here is his offering: [I sent this to Jon in private email
ome time ago when we were first discussing the question; it
wasn't intended as a complete solution to the problem. -MJD]

        open IN, "rlets" or die "can't open rlets\n";
        my $rlets = <IN>;
        print $rlets;
        chomp $rlets;
        my $candidates = shift;
        open IN, "<", $candidates or die "cannot open $candidates:
        $!\n";

        my $code = q{
                 print sort { my $aa = lc $a; my $bb = lc $b;
                         $aa =~ tr/RLETS/a-z/;
                         $bb =~ tr/RLETS/a-z/;
                         $aa cmp $bb } <IN>;
        };

        $code =~ s/RLETS/$rlets/g;
        eval $code;

Note that the q{ } does a single quoting - no interpolation.
I'm surprised that Perl knows about balanced {}'s and does
not end the quoted string after "$aa cmp $bb".  Or maybe
I'm not surprised! :)

This code does not use a Schwartzian Transform but is
just as quick - evals are apparently quite expensive!
[I think that's because Perl must run its parser. -MJD]

On other points of the problem:

Generating a random permutation of A to Z was accomplished
in a variety of ways.  Here is one by Mark Morgan:

     my @letters = ( 'A' .. 'Z' );

     my $alphabet = '';
     $alphabet .= splice( @letters, rand(@letters), 1 )
       while @letters;

Kester Allen and John Trammell used a module to generate the 
permutation:

        use List::Util qw/shuffle/;

Rick Measham offered this regex to ensure that a permutation
(not generated by the program) was an actual permutation of
A..Z:

        die("Permutation file '$permutation_file' does not contain a
        permutation\n")
		unless $permutation =~ /^(?:([A-Z])(?!.*?\1)){26}$/;

Wow!  I'd have to study on that one! 

        [Let's do that.  The first part locates a capital letter, and
        captures it in $1:

                ([A-Z])

        The next part uses the 'negative lookahead' construction to
        require that the same letter *not* appear a second time; the
        letter may *not* be followed by some sequence of characters (.*)
        and $1 again (\1):

                (?! .* \1)

        The whole thing is wrapped in /^(...){26}$/ which requires that
        the entire string consist of just 26 such letters, each of which
        does not appear a second time.

        I do not know why Pr. Measham used .*? instead of .* .  -MJD]

Matthew Walton submitted solutions in both Perl and a
language called Haskell 98.  I'd never heard of it!  Thanks,
Matthew, for opening my eyes to other realms.  I do tend to
get somewhat narrow minded in my Perl obsessiveness.  I've
heard good things about Ruby but have not perused it much
yet...

        [Haskell is my favorite programming language!  One thing I
        learned from Haskell is that inferring block structure from
        white space is not an intrinsically dumb idea; what is dumb is
        the way Python has implemented it.  It's always nice to find out
        that a dumb idea is smarter than you thought it was.

        Haskell has two flagship features.  First, it has an extremely
        powerful type system, and the Haskell compier figures out all
        the types for you so that it is very rare to have to declare any
        variables; if the compiler figures out the wrong type, it is
        almost always because you made a serious programming error.  See
        http://perl.plover.com/yak/typing/ for more details about this.
        That article is about SML, which has a less sophisticated type
        system than Haskell, but perhaps it gives the flavor.

        Second, Haskell uses "lazy evaluation", which means that it
        never evaluates anythig it doesn't have to.  Consider Perl code:

           my ($y) = grep /foo/, glob("*");

        This searches the entire directory, constructs a list of files,
        scans the entire list, constructs another list of all the files
        whose names contain "foo", puts the first one into $y, and
        throws the rest away, wasting most of the work it just did.

        In the Haskell analog of this code, no work will be done until
        the value of $y is actually required, and then the directory
        will be searched and scanned only far enough to find the right
        value for $y.  This happens automatically.  One result is that
        you can write

                carrots = "carrot" : carrots;

        and get an infinite list of carrots; Haskell is happy to
        manipulate infinite lists.  Another result is that you can write

                max = hd sort numbers;

        to get the smallest number in the list ("hd" gets the value at
        the head of a list) and this runs in time proportional to the
        length of the list, because the sort only sorts enough of the
        list to find the first element of the result.  -MJD]

Christopher Eckholm submitted his solution on a web page
which made it easy to look at! [http://www.chrekh.se/qotw/elect.pl]

Everyone's code was rather neatly done and well documented.   It would
be a pleasure to work with any of you as a team member!

Jon

[Thanks to everyone who participated, and particularly to those who did
so in private and never told anyone about it.  I will send the new quiz
tomorrow.  -MJD]