[Pdx-pm] Software development and The Rules

Thu Sep 26 23:10:31 CDT 2002

On Thu, Sep 26, 2002 at 05:07:32PM -0700, Joshua Keroes wrote:
> I have benchmarks, but I can't coax B::TerseSize to give me the size of
> memory in use. (Help, anyone?) Nevertheless, I can make some guesses
> about how much memory is in use:
> 
> Memory usage: 
> 
>         What did we decide was the array memory usage before? 30 MB or
>         something? This solution uses 122K + Perl overhead, maybe 130K
>         all told. (!) That's a few orders of magnitude better.
> 

	At 14 bits/number * 1000000 phone numbers you are still at a couple
megs of data, unless you are doing some sort of internal compression or have
figured a way to squeeze each number into a single bit. You're still a couple
order of magnitude smaller given my test (below).

> 	I don't think I'd want to consider testing this on all of the
> 	telephone numbers in the US, let alone the world. If we allow
> 	for all possible numbers, it's 1000x the size of the 7 digit set,
> 	30 GB. Yikes! Do you have 30 GB of RAM? Even is we wanted to
> 	use swap, our computer would need quite a few GB or RAM. Yikes!

	You're storing the numbers somewhere, right?

foreach $area_code sort { $a <=> $b } (@area_codes) {
  open( AC, "numbers/$area_code" ) || die "can't open $area_code file, $!";
  @numbers = <AC>;
  chomp @numbers;
  @sorted = sort { $a <=> $b } (@numbers);
  ...
}

>         I've tested seven digit numbers only, with two datasets, first
>         filling our million numbers 1/3 full and then 2/3 full. I'm
>         surprised to see better performance on the denser set. I would
>         have assumed that with fewer elements we'd have fewer operations
>         and therefore more speed. Wrong.
> 
>         As we can see from the benchmarks, there was no speed gain.
>         Either I screwed up or Perl is great (which we already know).
>         I'll may have to eat my words. If someone figures out the
>         solution, we can discuss why I thought this would have been
>         different.
> 

	Beats me, but perl is supposed to use pretty brainy sort methods
(qsort maybe?). I'm sure if you did bitwise stuff in C using a good sort
method you could get a faster result.

>         'hash' is a hash-based solution. I wanted to use an array,
>         but the hash guarantees unique numbers. 'mine' is the
>         TOP-ZEEKRET solution.
> 

	The 'dumb' method for sorting a million 10 digit random numbers
took 12 seconds on my 600Mhz PIII PC (running Linux, natch). That's the
sort time only. YMMV, of course. It swallowed a full 136 megs of ram
(according to top) while doing the calculations. It looked as if 50% of
that (or more) was used by perl during the sort. Almost as bad as some
(unnamed) web browsers.
	One interesting effect was that if I just did sort @randoms w/out
assigning the result to a variable the script executed almost instantaneously.
It looks like perl knew the result wouldn't get used and so skipped over it.
Pretty smart!

	Austin