[Edinburgh-pm] Code cleanup
miles at assyrian.org.uk
Mon Mar 5 13:07:27 PST 2012
On 05/03/12 17:24, Aaron Crane wrote:
> lines 20, 22, 52: perhaps use <> instead of manually opening the file
My concern here is error reporting: you can get the name of the
currently-open file from $ARGV, but $. increases across files. What's
the best way round this? Also, I don't know what's the correct behaviour
for the program if it's handed several datasets - currently it ignores
everything before the final "probes" line, which is almost certainly wrong.
> lines 26, 76: inconsistent that these aggregates get an empty
> initializer, when most don't (lines 25, 56, 57)
What's best practice here - explicit initializers, or not?
> line 54: we copy $indexLine into $geneNumber, then never touch
> $indexLine again; perhaps replace $indexLine with $geneNumber
Actually, we never touch $geneNumber either! So I've deleted it and just
left $indexNumber. My plan now is to unify $nameArray and $dataArray
into one datastructure, and eliminate $indexNumber too.
> line 81: `for my $i (0 .. $filterNumber-1)`
Similarly, a unified name/values structure will allow me to eliminate
$filterNumber and $i.
> or, if you're willing to require Perl 5.12, replace lines 81-89 with:
Hmm, yeah, tough one. I don't know how much control this person has over
their computing environment. On the other hand, I'd like to indoctrinate
them into the cult of Modern Perl ASAP :-)
[std dev calculation]
I've altered it to use sum and map, as suggested, but kept the
calculation the same. Yes, they're provably equivalent, but it seems
less confusing to change the expression and the algorithm separately.
> Do the code under discussion and Statistics::Basic maybe differ in
> whether they divide by N versus N-1? (I forget the relevant
> terminology, I'm afraid.)
Sounds plausible! IIRC, one is the standard deviation of the sample, and
the other is an estimate of the standard deviation of the distribution
from which the sample's drawn. But A-level stats was a long time ago,
and I can't remember which is which.
Another question: what's the best way of getting the user to install all
the dependencies? Add a Makefile.PL and tell them to run "cpan ."?
More information about the Edinburgh-pm