[Pdx-pm] Fwd: Genomics at OHSU

David Pool dpool at hevanet.com
Mon Nov 28 11:40:14 PST 2005

(continuing the top post...)

Perhaps you should suggest he read "How Perl Saved the Human Genome


With main points being:

"I think several factors are responsible: 

1. Perl is remarkably good for slicing, dicing, twisting, wringing,
smoothing, summarizing and otherwise mangling text. Although the
biological sciences do involve a good deal of numeric analysis now, most
of the primary data is still text: clone names, annotations, comments,
bibliographic references. Even DNA sequences are textlike.
Interconverting incompatible data formats is a matter of text mangling
combined with some creative guesswork. Perl's powerful regular
expression matching and string manipulation operators simplify this job
in a way that isn't equalled by any other modern language. 

2. Perl is forgiving. Biological data is often incomplete, fields can be
missing, or a field that is expected to be present once occurs several
times (because, for example, an experiment was run in duplicate), or the
data was entered by hand and doesn't quite fit the expected format. Perl
doesn't particularly mind if a value is empty or contains odd
characters. Regular expressions can be written to pick up and correct a
variety of common errors in data entry. Of course this flexibility can
be also be a curse. I talk more about the problems with Perl below. 

3. Perl is component-oriented. Perl encourages people to write their
software in small modules, either using Perl library modules or with the
classic Unix tool-oriented approach. External programs can easily be
incorporated into a Perl script using a pipe, system call or socket. The
dynamic loader introduced with Perl5 allows people to extend the Perl
language with C routines or to make entire compiled libraries available
for the Perl interpreter. An effort is currently under way to gather all
the world's collected wisdom about biological data into a set of modules
called "bioPerl" (discussed at length in an article to be published
later in the Perl Journal). 

4. Perl is easy to write and fast to develop in. The interpreter doesn't
require you to declare all your function prototypes and data types in
advance, new variables spring into existence as needed, calls to
undefined functions only cause an error when the function is needed. The
debugger works well with Emacs and allows a comfortable interactive
style of development. 

5. Perl is a good prototyping language. Because Perl is quick and dirty,
it often makes sense to prototype new algorithms in Perl before moving
them to a fast compiled language. Sometimes it turns out that Perl is
fast enough so that of the algorithm doesn't have to be ported; more
frequently one can write a small core of the algorithm in C, compile it
as a dynamically loaded module or external executable, and leave the
rest of the application in Perl (for an example of a complex genome
mapping application implemented in this way, see

6. Perl is a good language for Web CGI scripting, and is growing in
importance as more labs turn to the Web for publishing their data. "


On Mon, 2005-11-28 at 11:02 -0800, Thomas J Keller wrote:
> Hi All,
> This is probably cheating. Tough, defending Perl against this Python
> snob should get you riled up enough to burn up a few of those T-day
> calories. So help me out here. Any other good arguments for this in
> house discussion? The fellow I got into this with is an Assistant
> Professor  in the Medical Informatics program. I'd like him to get
> over his dislike for Perl.
> What occurs to me to say is that if you can't understand someone
> else's perl code, they either didn't intend you to, or they were too
> short-sighted to write it properly. And what I like about Perl is that
> I can do anything with it that needs to be done. (I usually have to
> ask some help from the highly vaunted perl mongers of course.)
> Tom
> Begin forwarded message:
> > From: "Aaron Cohen" <cohenaa at ohsu.edu>
> > Date: November 28, 2005 10:44:55 AM PST
> > To: "Tom Keller" <kellert at ohsu.edu>
> > Subject: Re: Genomics at OHSU
> > 
> > 
> > Tom:
> > Challenges are good when they lead to interesting discussions.
> > Here's my two cents.
> > 
> > 
> > I can't argue against the fact that Perl is good for one line string
> > manipulations. However, old-fashioned Unix tools such as sed do the
> > same thing.
> > Python isn't geared towards this kind of brevity. The reason that I
> > like Python is that it makes me more productive, and I have been
> > able to use it as a single language
> > to do lots of things, both small and large quickly.
> > 
> > 
> > I think that any Python lover (not just me) would admit that Python
> > doesn't provide the "least characters" way of doing things. But they
> > would also consider
> > that a low priority compared to programmer productivity, debugging
> > time, ability to understand other's code, etc. The problem with
> > using brevity as a measure of goodness is that it becomes irrelevent
> > for programs of any decent size. I like Python because I have
> > writing all kinds of complex programs (genetic optimizers, machine
> > learning classifiers, statistical analyzers, video editors, Sudoku
> > solvers, etc.) and months or years later I can still understand the
> > programs that I wrote. 
> > My gripe against Perl is that I can't make heads or tails out of
> > someone else's code without a language manual by my side. Do you
> > have any experience in taking and adapting someone else's largish
> > Perl code? Perhaps your experience is different.
> > 
> > 
> > That said, one should always pick an appropriate tool for the job,
> > and if Perl works for what you're doing, great. I think that you
> > implied that Perl is your first language, so you may want to look
> > into another as a comparison. I like Python, but I have heard some
> > nice things about Ruby. And of course, Java is always a good
> > language to know something about.
> > 
> > 
> > -Aaron
> > 
> > 
> > > > > Thomas J Keller <kellert at ohsu.edu> 11/25/2005 9:30 AM >>>
> > Hi Aaron,
> > As you can see, once I'm given a challenge, I don't easily let it
> > go:
> > 
> > 
> > After sending my last message, I realized there was an even shorter
> >  
> > perl method from the command line
> > $ perl -pe 'y/Z/5/'
> > 
> > 
> > 
> > 
> > Tom
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list

More information about the Pdx-pm-list mailing list