[Edinburgh-pm] Code cleanup

Mon Mar 5 09:24:55 PST 2012

Miles Gould <miles at assyrian.org.uk> wrote:
> So I thought I'd do
> a step-by-step cleanup of their code, so that they, and hopefully any
> interested bystanders, might learn.

Nice idea.

> Does anyone have any suggestions?

lines 20, 22, 52: perhaps use <> instead of manually opening the file

lines 26, 76: inconsistent that these aggregates get an empty
initializer, when most don't (lines 25, 56, 57)

line 37: this elsif is redundant now that you've got the
looks_like_number() test

lines 34, 41: hoist the chomps out of the conditional

lines 42-44: better: `my ($name, @values) = split " ", $line`
(clarifies what the syntax is, by avoiding the manual `shift`)

line 46: I prefer `if any { !predicate }` to `unless all { predicate
}`, but I suppose that's personal taste

line 47: `$dataArray[$indexLine] = \@values`

line 54: we copy $indexLine into $geneNumber, then never touch
$indexLine again; perhaps replace $indexLine with $geneNumber
throughout?

lines 71-78: delete; and modify lines 91-92 and 95-98 to declare the
variables as needed

line 81: `for my $i (0 .. $filterNumber-1)`

lines 82-89: much simpler as this:

  my $data = $filterData[$i];
  my @controlArray = @$data[ 0 .. 19];   # first 20
  my @sampleArray  = @$data[20 .. 40];   # next 21

or, if you're willing to require Perl 5.12, replace lines 81-89 with:

  while (my ($i, $data) = each @filterData) {
      my @controlArray = @$data[ 0 .. 19];   # first 20
      my @sampleArray  = @$data[20 .. 40];   # next 21

lines 106, 109, 112: if you're willing to require Perl 5.12, rename
$scoreCounter to $i, and replace with:

  my @sorted = sort keys %scoreHash;
  while (my ($i, $key) = each @sorted) {

lines 127, 129-131: `use List::Util qw<sum>` at the top, and then `my
$sum = sum(@$array)`

lines 133-139: similarly, I think this clarifies the computation:

  my $squares_sum = sum(map { $_ * $_ } @$array);
  my $stddev = sqrt(($squares_sum - $sum * $sum / $n) / ($n - 1);

(with a declaration of $n, but early enough to use in the calculation
of $mean, too) — but even better would be to use a CPAN module, as in
your stats_basic branch

> I haven't merged the stats_basic branch
> yet because it changes the output data extensively, and I'd like to run a
> few calculations by hand to determine which set of answers is right.

Do the code under discussion and Statistics::Basic maybe differ in
whether they divide by N versus N-1?  (I forget the relevant
terminology, I'm afraid.)

-- 
Aaron Crane ** http://aaroncrane.co.uk/