[sf-perl] what do you do?

Mark Kvale kvale at phy.ucsf.edu
Wed Feb 18 12:13:08 PST 2009

I'm a statistical geneticist at UCSF. My work breakdown in time spent is

30% - developing mathematical models and algorithms for exploratory analysis of
genetic data. All of this ends up in programs, but it involves looking up
journal articles, thinking, discussions and analytic analysis with pen and paper

40% - implementing the above algorithms, debugging and interpreting results.

15% - data munging and web app development. These are small applications that
demo the models or algorithms in a scientific paper. Not pretty, but they get
the job done.

15% - Meetings, email, talks, helping others with computer problems, etc.


70% perl -  I hold a contrarian view, in that I think perl is a great language
for exploratory data analysis. It's awesome for data munging and and is often
fast enough for even involved mathematical programming.

20% C++ - for memory hungry or especially lengthy algorithms.

10% R - a statistical programming language
    SQL - mostly to store or retrieve data for further analysis
    SGE + MPI - SGE is the Sun Grid Engine. It has a little language
                for submitting and controlling distributed computations
                in our linux cluster
                MPI - a parallel programming library.

As my datasets grow from a gigabyte to a terabyte in size, I find myself
programming more in C++, mostly because perl is too memory-hungry to keep much
data in RAM. The PDL module can help me avoid C++ at times. I've read that Perl6
 may have compact multidimensional data types. This would be a real boon for my
type of programming.


More information about the SanFrancisco-pm mailing list