[sf-perl] what do you do?
Mark Kvale
kvale at phy.ucsf.edu
Wed Feb 18 12:13:08 PST 2009
I'm a statistical geneticist at UCSF. My work breakdown in time spent is
30% - developing mathematical models and algorithms for exploratory analysis of
genetic data. All of this ends up in programs, but it involves looking up
journal articles, thinking, discussions and analytic analysis with pen and paper
40% - implementing the above algorithms, debugging and interpreting results.
15% - data munging and web app development. These are small applications that
demo the models or algorithms in a scientific paper. Not pretty, but they get
the job done.
15% - Meetings, email, talks, helping others with computer problems, etc.
Languages:
70% perl - I hold a contrarian view, in that I think perl is a great language
for exploratory data analysis. It's awesome for data munging and and is often
fast enough for even involved mathematical programming.
20% C++ - for memory hungry or especially lengthy algorithms.
10% R - a statistical programming language
SQL - mostly to store or retrieve data for further analysis
SGE + MPI - SGE is the Sun Grid Engine. It has a little language
for submitting and controlling distributed computations
in our linux cluster
MPI - a parallel programming library.
As my datasets grow from a gigabyte to a terabyte in size, I find myself
programming more in C++, mostly because perl is too memory-hungry to keep much
data in RAM. The PDL module can help me avoid C++ at times. I've read that Perl6
may have compact multidimensional data types. This would be a real boon for my
type of programming.
Mark
More information about the SanFrancisco-pm
mailing list