[Edinburgh-pm] Code cleanup

Miles Gould miles at assyrian.org.uk
Mon Mar 5 13:07:27 PST 2012


On 05/03/12 17:24, Aaron Crane wrote:
> lines 20, 22, 52: perhaps use <>  instead of manually opening the file

My concern here is error reporting: you can get the name of the 
currently-open file from $ARGV, but $. increases across files. What's 
the best way round this? Also, I don't know what's the correct behaviour 
for the program if it's handed several datasets - currently it ignores 
everything before the final "probes" line, which is almost certainly wrong.

> lines 26, 76: inconsistent that these aggregates get an empty
> initializer, when most don't (lines 25, 56, 57)

What's best practice here - explicit initializers, or not?

> line 54: we copy $indexLine into $geneNumber, then never touch
> $indexLine again; perhaps replace $indexLine with $geneNumber
> throughout?

Actually, we never touch $geneNumber either! So I've deleted it and just 
left $indexNumber. My plan now is to unify $nameArray and $dataArray 
into one datastructure, and eliminate $indexNumber too.

> line 81: `for my $i (0 .. $filterNumber-1)`

Similarly, a unified name/values structure will allow me to eliminate 
$filterNumber and $i.

> or, if you're willing to require Perl 5.12, replace lines 81-89 with:

Hmm, yeah, tough one. I don't know how much control this person has over 
their computing environment. On the other hand, I'd like to indoctrinate 
them into the cult of Modern Perl ASAP :-)

[std dev calculation]
I've altered it to use sum and map, as suggested, but kept the 
calculation the same. Yes, they're provably equivalent, but it seems 
less confusing to change the expression and the algorithm separately.

> Do the code under discussion and Statistics::Basic maybe differ in
> whether they divide by N versus N-1?  (I forget the relevant
> terminology, I'm afraid.)

Sounds plausible! IIRC, one is the standard deviation of the sample, and 
the other is an estimate of the standard deviation of the distribution 
from which the sample's drawn. But A-level stats was a long time ago, 
and I can't remember which is which.

Another question: what's the best way of getting the user to install all 
the dependencies? Add a Makefile.PL and tell them to run "cpan ."?

Miles


More information about the Edinburgh-pm mailing list