[Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the
Week #18
Brett D. Estrade
estrabd at yahoo.com
Thu Jul 1 10:44:42 CDT 2004
=====
http://www.brettsbsd.net/~estrabd
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
----- Original message -----
From: "Jon Bjornstad" <jon at icogitate.com>
To: perl-qotw at plover.com
Date: Mon, 28 Jun 2004 23:28:11 -0700
Subject: Solutions and Discussion for Perl Quiz of the Week #18
Sample solutions and discussion
Perl Quiz of The Week #18 (20040623)
[Note: This was quiz #18, not #16. I sent out the quiz with the wrong
Subject line. -Mark Jason Dominus]
For statewide elections, the state of California uses a "random
alphabetic" sort to order the candidates' names on the ballot.
This link describes that special sort:
http://www.ss.ca.gov/elections/elections_ra.htm
(mirrored at
http://perl.plover.com/qotw/misc/r018/elections_ra.htm )
The idea is this: a random order is selected for the twenty-six
letters of the alphabet, and then the names on the ballot are
sorted according to this random order, instead of according to
the usual order.
This is the order in which the names appear on the ballot in
Assembly District 1. In Assembly District 2, the order is the
same, except that the first name on the list has been moved to
the end of the list. In Assembly District 3, the next name on
the list is moved to the end of the list, and so on.
You will write a program to generate ballots. The program will
be invoked like this:
elect.pl [-r] [permutation] [district-number] [namefile]
(Items in [square brackets] are optional.)
The 'permuation' argument will default to 'randlet.txt'. If the
file exists, it should contain a single line of text with a
random permutation of the 26 letters of the alphabet. If this
file does not exist, the program should generate one at random.
The -r option will force the file to be regenerated even if it
exists already.
The 'district-number' is the number of the Assembly District for
which the ballot will be generated. If omitted, the program
should generate the ballot for Assembly District 1.
The 'namefile' is a file that contains the names of the
candidates. The names come one per line with first name then
last name:
Jill Harmon
Gus Tribble
Walter Reston
Norma Kretschmer
Fiorella Squamuglia
Vern Smith
Bill Smith
Ed Squid
John Macdonald
Angus MacDonald
If the namefile is omitted, the program should read the names
from the standard input.
The program should print out the candidates' names, one per
line, in the order specified by California state law, sorted
according to the random permutation of the alphabet, with names
rearranged as appropriate for the specified Assembly District.
For example, if the file 'permutation' contains
QWERTYUIOPASDFGHJKLZXCVBNM
and the 'namefile' contains the names listed above, then the
output for Assembly District 1 should be:
Walter Reston
Gus Tribble
Ed Squid
Fiorella Squamuglia
Vern Smith
Bill Smith
Jill Harmon
Norma Kretschmer
Angus MacDonald
John Macdonald
For Assembly District 4, it should be:
Fiorella Squamuglia
Vern Smith
Bill Smith
Jill Harmon
Norma Kretschmer
Angus MacDonald
John Macdonald
Walter Reston
Gus Tribble
Ed Squid
----------------------------------------------------------------
When I first encountered the 'random alphabetic ordering'
method of the California election system I immediately
thought it would make for an interesting Perl challenge
and it has.
These people submitted ideas/solutions (in no
particular order :):
Matthew Walton
Christer Ekholm
John J. Trammell
Yitzchak Scott-Thoennes
Randy W. Sims
Adrian Scottow
Rod Adams
Kester Allen
Jonathan Scott Duff
Mark Jason Dominus
Torsten Hofmann
the hatter
Patrick LeBoutillier
Rick Measham
Mark Morgan
There was first a discussion to refine the specification.
Command line arguments and how to separate first and last
names and what about non-alphabetic characters in names.
[Some of this was my fault. Jon's original request was for
a program whose input was in a different format than what I
sent out; I rewrote it considerably. I was aware that the
argument format I inserted was ambiguous, but didn't think
it would be a big problem. -MJD] I tried to short circuit
some of this discussion as the point of the quiz was to have
some fun with sorting techniques. Because of this I won't
discuss the various ways of using command line options to
get the required data into the program. I'll focus on the
sort itself.
Many people saw right away that since comparing two
names would be a complex affair that this was a prime
opportunity for using a Schwartzian Transform.
This is a technique invented by and named after (not by)
Randall Schwartz, author of the classic
"Learning Perl". If any of you do not know of this
technique you can read about it in several places (Perl Cookbook,
Effective Perl, perldoc -f sort, etc).
The key idea in a complex sort is to pre-generate a sort key
for each item BEFORE you do the sort rather than each time
the sort compares two elements. This is much more efficient and
simply darn clever! The use of anonymous arrays
and maps unwind the pregenerated sort keys from the desired
result:
# schwartzian transform for efficiency and fun
@names =
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, sortkey $_ ] }
@names;
How to generate a sort key in this case?
The tr/// function comes in quite handy here.
If the permuted alphabet in in $alphabet then this will do it:
sub sortkey {
my $name = shift;
$name =~ tr/$alphabet/A-Z/;
return $name;
}
After this transformation the name we can simply do
a simple compare and the result will be to have respected
the new alphabetic ordering.
Unfortunately tr/// does not do variable interpolation (why?).
So we need to wrap the tr/// in an eval.
There is also the need to swap first and last names
and do the sort case-insensitively.
So here is what Rick Measham coded:
sub encode {
# Case insensitive
my $switch = uc shift;
my $permutation = shift;
# Last name first
$switch =~ s/^([\w-]+)\s+(.+)$/$2 $1/;
# Switch in the new alphabet -- tr doesn't take
# variables so we have to eval the
# transliteration. We checked the permutation
# before so we can trust it. It's just a list
# of letters.
eval "\$switch =~
tr/$permutation/ABCDEFGHIJKLMNOPQRSTUVWXYZ/";
return $switch;
}
# Schwartzian Transform -- see note below
my @sorted = map { $_->[1] }
sort { $a->[0] cmp $b->[0] }
map { [ encode($_, $permutation), $_ ] } @names;
That pretty much summarizes what I think is the optimal approach.
One improvement suggested by Mark Jason Dominus was
to only do ONE eval rather than one for each name in the list.
Here is his offering: [I sent this to Jon in private email
ome time ago when we were first discussing the question; it
wasn't intended as a complete solution to the problem. -MJD]
open IN, "rlets" or die "can't open rlets\n";
my $rlets = <IN>;
print $rlets;
chomp $rlets;
my $candidates = shift;
open IN, "<", $candidates or die "cannot open $candidates:
$!\n";
my $code = q{
print sort { my $aa = lc $a; my $bb = lc $b;
$aa =~ tr/RLETS/a-z/;
$bb =~ tr/RLETS/a-z/;
$aa cmp $bb } <IN>;
};
$code =~ s/RLETS/$rlets/g;
eval $code;
Note that the q{ } does a single quoting - no interpolation.
I'm surprised that Perl knows about balanced {}'s and does
not end the quoted string after "$aa cmp $bb". Or maybe
I'm not surprised! :)
This code does not use a Schwartzian Transform but is
just as quick - evals are apparently quite expensive!
[I think that's because Perl must run its parser. -MJD]
On other points of the problem:
Generating a random permutation of A to Z was accomplished
in a variety of ways. Here is one by Mark Morgan:
my @letters = ( 'A' .. 'Z' );
my $alphabet = '';
$alphabet .= splice( @letters, rand(@letters), 1 )
while @letters;
Kester Allen and John Trammell used a module to generate the
permutation:
use List::Util qw/shuffle/;
Rick Measham offered this regex to ensure that a permutation
(not generated by the program) was an actual permutation of
A..Z:
die("Permutation file '$permutation_file' does not contain a
permutation\n")
unless $permutation =~ /^(?:([A-Z])(?!.*?\1)){26}$/;
Wow! I'd have to study on that one!
[Let's do that. The first part locates a capital letter, and
captures it in $1:
([A-Z])
The next part uses the 'negative lookahead' construction to
require that the same letter *not* appear a second time; the
letter may *not* be followed by some sequence of characters (.*)
and $1 again (\1):
(?! .* \1)
The whole thing is wrapped in /^(...){26}$/ which requires that
the entire string consist of just 26 such letters, each of which
does not appear a second time.
I do not know why Pr. Measham used .*? instead of .* . -MJD]
Matthew Walton submitted solutions in both Perl and a
language called Haskell 98. I'd never heard of it! Thanks,
Matthew, for opening my eyes to other realms. I do tend to
get somewhat narrow minded in my Perl obsessiveness. I've
heard good things about Ruby but have not perused it much
yet...
[Haskell is my favorite programming language! One thing I
learned from Haskell is that inferring block structure from
white space is not an intrinsically dumb idea; what is dumb is
the way Python has implemented it. It's always nice to find out
that a dumb idea is smarter than you thought it was.
Haskell has two flagship features. First, it has an extremely
powerful type system, and the Haskell compier figures out all
the types for you so that it is very rare to have to declare any
variables; if the compiler figures out the wrong type, it is
almost always because you made a serious programming error. See
http://perl.plover.com/yak/typing/ for more details about this.
That article is about SML, which has a less sophisticated type
system than Haskell, but perhaps it gives the flavor.
Second, Haskell uses "lazy evaluation", which means that it
never evaluates anythig it doesn't have to. Consider Perl code:
my ($y) = grep /foo/, glob("*");
This searches the entire directory, constructs a list of files,
scans the entire list, constructs another list of all the files
whose names contain "foo", puts the first one into $y, and
throws the rest away, wasting most of the work it just did.
In the Haskell analog of this code, no work will be done until
the value of $y is actually required, and then the directory
will be searched and scanned only far enough to find the right
value for $y. This happens automatically. One result is that
you can write
carrots = "carrot" : carrots;
and get an infinite list of carrots; Haskell is happy to
manipulate infinite lists. Another result is that you can write
max = hd sort numbers;
to get the smallest number in the list ("hd" gets the value at
the head of a list) and this runs in time proportional to the
length of the list, because the sort only sorts enough of the
list to find the first element of the result. -MJD]
Christopher Eckholm submitted his solution on a web page
which made it easy to look at! [http://www.chrekh.se/qotw/elect.pl]
Everyone's code was rather neatly done and well documented. It would
be a pleasure to work with any of you as a team member!
Jon
[Thanks to everyone who participated, and particularly to those who did
so in private and never told anyone about it. I will send the new quiz
tomorrow. -MJD]
More information about the NewOrleans-pm
mailing list