[Kc] book review of "Beuitiful Code: ..."

Fri Oct 12 11:12:56 PDT 2007

slashdot has rejected my work.  Oh well, here it is.

****************************

Beautiful Code is a collection of thirty-three essays by notable
computer programmers, all dared by editors Greg Wilson and
Andy Oram to hold forth about a piece of source code they
have found notably beautiful, apparently with intention of producing
what may become a standard computer science textbook. Such
an experiment has not been carried out before.

Were Don Knuth dead, this book would probably be dedicated to his memory.
Beautiful Code collects "cultural wisdom" of computer programmers.
If one were to consider The Art of Computer Programming to be something of
a Pentateuch, Beautiful Code could the accompanying Midrash. Having
obtained an e-copy from O'Reilly and processed it with pdf2txt and
then run
<code> grep -ci knuth Chapter*.txt</code>, I found
that not everyone mentioned Knuth. Those who did have a K number
greater than zero by their name in the synopses below. (Asterisks
before the author's name represent other chapters mentioning the
author.)

<p>Speaking of experiments, O'reilly is trying to develop an online
community based on the book, with <a
href="http://beautifulcode.wiki.oreilly.com/wiki/index.php/Main_Page">a
wiki.</a><p>

I had several programming projects prioritized ahead of writing this book
review, but after leafing through the six hundred pages and reading a few of the
pieces in it at random I realized that my professional work would benefit from
making reading it cover to cover my primary free-time action item.<p>

To be systematic and thorough, here are quick summaries of the essays, which
are more prolix than the editor's version of the same exercise in the
preface. Proceeding in this fashion is really horrible book review
practice, but there really
isn't any cleaner way to approach this particular text.

*** Brian Kernighan (K0) discusses Rob Pike's minimal regular
expression engine optimized for pedagogy.

Karl Fogel (K0) tells the origin story of Subversion's "delta editor"
internal interface, including
an abandoned branch. He introduces a human-factors aspect to software
beauty: by
having a strongly defined interface, development time which might be
squandered on
debating interface points can be put to more productive use.

** Jon Bentley (K4), the author of Programming Pearls, holds forth
about quicksort, instrumenting
quicksort, and generating mathematical forumlae which give the same
results. His essay nicely demonstrates the tension between procedural
(pseudocode) and descriptive (mathematics) language.

Tim Bray (K0), writing in a tone suitable for a more general audience,
uses Ruby's regexps to
analyze his Apache server logs from his blog, which he encourages the
reader to check
out, to build a case for his suspicion that Google uses binary search.

* Elliotte Rusty Harold (K1) draws on experience writing XML verifiers
to reinforce the truism that
early passes at a problem should focus on correctness, saving speed for later.

Michael Feathers (K0) explains his appreciation for Ward Cunningham's
FIT Java testing framework's
noncompliance against Java coders' rules of thumb, revealing a tension
between closed (final
classes, designed-in hooks and extension points) and open (working
stub base classes) approaches to
designing for extensibility. FIT-derived systems take HTML system
documentation with tests
embedded in it in tables as input and fill in one table cell with a
pass/fail result. By carefully
selecting what to hold constant and deferring extensibility to authors
of derived classes, FIT
allows Java developers, jaded and wounded by corporate infighting, to
appreciate the fundamental
elegance of a solid, basic, O-O approach.

* Alberto Savoia (K1) plays fast and loose with math jargon. He
discusses JUnit as an introduction to an exploration of how to test a
binary search function for correctness. He performs a math
proof -- claiming that a conclusion follows from implications of
known-true axioms -- without
calling it that, and a few pages later uses the word "induction" in
the non-mathematical
sense. He gives us a quick course in software testing best practices,
including smoke testing,
boundary testing, and what he calls "test theories" which are
statements made about what the
tested system is supposed to do, and some techniques for deriving
additional test theories
off the "happy path." Apparently 2006 was the year that data sets
reached the size where the (a+b)/2 method of finding the midpoint
between a and b started overflowing using signed 32-bit integers,
which wreaked all kinds of weird havoc.

Charles Petzold (K0) uses and endorses dynamic code generation for
efficient image processing, reminiscing about the feats of the team
who wrote Windows 1.0 in 8086 assembly language and applying similar
techniques in his C# code thanks to the System.Reflection.Emit
namespace. The results aren't easy on the eyes, but they run in a
quarter of the time of equivalent code that makes decisions inside
loops. It is not clear if the MSIL compiler is going to be so brazen
as to perform loop unrolling on bytecodes emanating from ILGenerator
objects; if not, I expect that the optimal acceleration of the dynamic
method would occur with the loops unrolled into a modified Duff's
device large enough to use almost all of the CPU cache. That would
remove many loop bounds comparisons, which aren't that many
instructions compared with the operation being performed on each
pixel, but which may necessitate CPU pipeline stalls. To save another
two percent of the time, and possibly hurt general system performance
during the operation. Small processes take less effort to switch
between. So never mind. Unless you want to benchmark it for
something to do.

Douglas Crockford (K1), who explains why LISP has never been accepted
by the mainstream, and why
the LISP community doesn't care, proposes to use a technique from a
1973 paper (anticipating
object orientation, no less) by Vaughn Pratt ("Top Down Operator
Precedence") that modified
a technique published by Robert W. Floyd (for whom Knuth wrote a
glowing eulogy) to write a
parser that can process a subset of ECMA-262 (a.k.a. Javascript)
entirely within that subset,
which includes functions, objects, and JSON literals. It's all about
the left binding power. And
the "nud" and the "led," terms which when given to a search engine
yield a slightly easier-going
version of chapter nine. Null denotation and left denotation. Truthy
and falsy. Don't repeat
yourself, write a macro. A parser yields a parse tree, and hands that
off to something else.
A proposal for accommodating the needs of both language designers who
want to reserve a lot
of symbols and programmers who don't want to have to synonymize around
them. Would extending
Pratt's technique to support languages where not everything is a
functional expression really
be as easy as Crockford makes it look? Later in the book, R. Kent
Dybvig (K0) explains how Scheme's syntax-case macro expander lets the
programmer forget the whole deal about using quasi-reserved names in
macros, because it abstracts them out for you, while offering an
extra-special syntax for when you really truly do want to interfere
with the internal operation of a macro. I am very pleased that
O'Reilly sent me this book to review while I have been working on a
macro system for Perl, and having read the Dybvig and Crockford
articles, Macrame.pm will be better that it would have been had I not
read these articles, when it finally does get completed.

Henry S. Warren, Jr. (K1) rants about a fundamental problem, the best
way to count
set bits. Stunningly, divide-and-conquer approaches using bit
shifting beat optimized brute force approaches, although calling them
opaque is an
understatement. Population count processing is applicable to a wide
variety of basic data processing infrastructure problems. Warren's
elucidation of the situation is a perfect example of the best practice
development process: correct and fast, in that order.

A USABILITY GURU: Ashish Gulhati (K0), who prefers to work remotely
from the Himalayas,
talks about how helping some social idealists inspired Cryptonite, his
end-user-friendly encrypted webmail system, then launches into the
history of the project,
including architecture implementation details and discussion of
security concerns and
benchmarking, and how using well defined interfaces made replacing one
component with
a more efficient one with comparable functionality a relative breeze.
Along with Lincoln
Steins's chapter, this chapter is a nice introduction to object
oriented Perl and how the
reusability aspect of a good interface design effectively
optimizes programmer time in any language.

BIOINFORMATICIANS: Lincoln Stein (K0), without encumbering his
narrative with O-O terminology, presents
his Bio::Graphics system for displaying genetics research results as a
example of usabilty
done right, this time usability by one's fellow programmers. The
advantage of using standard
interfaces becomes apparent when it became possible to support many
graphics formats
by drop-in replacement of the GD library, which outputs in only one
format, with a different
library with the same interface but configurability to output in other
formats too. The
tension between perfection and deliverability is discussed, as Stein
reveals his big wishlist item
for his library but opines that the level of rewriting required is
expected to prevent its
reprioritization, especially as there is an effective workaround.
Immediately following that, Jim Kent (K0) tells us how readability and
understandability lead to extensibility in the Gene Sorter, including
a short tutorial on writing polymorphic objects in C.

MATH GEEKS: Adam Kolawa (K0) uses and endorses the CERN math library,
which evolved into LAPACK. The well-commented core routines, in
Fortran with bindings for use by other languages, are in regular use
after three decades. Jack Dongarra and Piotr Luszczek (K0) discuss
recent modifications made to LAPACK to take full advantage of modern
hardware.

KERNEL HACKERS:Greg Kroah-Hartman (K0) appreciates the discipline
involved in maintaining the Linux kernel, where the acrobats operate
in full view of the world without the benefit of nets such as dynamic
type checking; Diomidis Spinellis (K0) uses the BSD virtual file
system as the springboard for presenting general truths about
indirection and maintainability; and Bryan Cantrill (K0) likens
software bugs to sewage, in particular an embarrassingly unforseen
race condition in a released version of Solaris.

PYTHON MONGERS:* Andrew Kuchling (K0) discusses Python's hash tables,
Travis E. Oliphant (K0) discusses the optimized n-dimensional iterator
that lets the NumPy python math library do matrix operations without
blowing its cache, and Rogerio Atem de Carvalho and Rafael Monnerat
(K0) hype the flexibility of ERP5, an enterprise resource planning
system that runs on python/zope.

ROBERT HEINLEIN FANS: Ronald Mak (K0) uses J2EE beans to communicate
both with Mars rovers and their legions of fans who want to download
terabytes.

PUTTING THE GOO IN GOOGLE: Jeffrey Dean and Sanjay Ghemawat (K0)
discuss MapReduce. If you don't already know what MapReduce is, you
probably know what to do to find out.

THE NETWORK IS THE COMPUTER: Simon Peyton Jones (K0) introduces STM
(Software Transactional Memory) which is an allegedly fool-proof
replacement for locking discipline in multithreaded Haskell, and then
discusses his solution to Trono's "Santa Claus problem" without
explaining why the problem is supposed to be difficult.

<p>

William R. Otte and Douglas C. Schmidt (K0) manage to abstract
concurrency models out of their framework for networked services.

Andrew Patzer (K0) reveals some practical best practices as he tells
the story of doing a straightfoward piece of systems integration using
RESTful web services.

RADICAL USABILITY Andreas Zeller (K0), apparently a devotee of the
Einsteinian craft-a-tool-to-straighten-the-bent-paperclip-from-the-good-paperclip-you-have-found
methodology, uses an automated divide-and-conquer approach to
discovering which of several thousand patches in the delta between two
versions of gdb was responsible for a mysterious loss of function in
the ddd graphical front end. Had ddd a debugging mode that logged the
full interaction between itself and gdb, the world might not be the
better for his automated technique.

* Yukihiro Matsumoto (K0), the author of Ruby, might be annoyed that
his essay on the nature of elegance, including the DRY (Don't Repeat
Yourself) principle and other similarly useful tips, was not selected
as a forward instead of buried towards the end of the collection.

COMPUTER ACCESS FOR ALL:
* T. V. Raman (K0) shares his amazement at how straightforwardly the
extensible design of emacs Lisp, with its
advice system that allows insertion of function before, after, or
fully replacing any of the existing emacs functions,
allowed his system to adapt emacs for eyes-free use to bush out over
the extended development of emacspeak
as a hobby project.  Takeaways include the advice that when adapting a
communication for another medium, it
is important to communicate the message, not the artifacts of the
previous medium.  He is justifiably proud of
his eyes-free calendar widget, among others. Bizarrely, there is no
mention of Morse Code in Arun Mehta(K0)'s retrospective on a project
to develop an improved system for the use of Stephen Hawking, who
could only use one button (but he could control how long he pressed
it, and could watch a screen as he did so.)

BEAUTIFUL CAN MEAN EASY ON THE EYES: Laura Wingerd and Christopher
Seiwald (K0), from perforce, discuss readable layout, including
selection of line break points, and reveal the strong correlation
between bugs and nested ifs. (Takeaway: prefer case statements, even
if it means repeating yourself)

<p>
CASTLES MADE OF SAND

Brian Hayes (K0) demonstrates the truth to the saying "while in other
disciplines the researchers
stand on the shoulders of giants, in computer science we stand on each
other's shoes" as he
flails around in search of, and eventually finds, the easy way to find
co-linearity of three points in order to
analyze satellite photos of the lower Mississippi river, which tends
to change course during high water.

<p>
IN CONCLUSION

I think if I had been asked to write a chapter for this book, I would talk about
normalized Huffman tables, as discussed in Managing Gigabytes, which Hans Reiser
recommended in response to my relating to him a conversation about
ReiserFS had by the Kansas City LUG. Beautiful code has a lot of
different things
happening in it, with snippets of source code in various
languages,\and nice diagrams where appropriate. It was not a fast
read, although some of the pieces do flow nicely,
due to the necessity of retooling one's entire brain from one essay to the next.

Were I still a single nerd, and my girlfriend were to present me with
Beautiful Code, perhaps as a birthday present, I would not hesitate
to select her as my personal and permanent nerdess, all other things
being equal.

David Nicol, inventor of the online tip jar and always looking for
collaborators with spare tuits, can be contacted through the
comment form at tipjar.com.

-- 
non-smoker since 11:31 pm october 10, 2007
"To do good in secret, and shun the world's
applause, is the surest testimony of a virtuous
heart and self-approving conscience."  --  John Polidori (
http://tinyurl.com/ytm8ba )