[Dub-pm] big data structures relative to memory size

Tim Bunce Tim.Bunce at pobox.com
Fri Apr 16 12:16:20 CDT 2004


On Fri, Apr 16, 2004 at 04:54:02PM +0100, Sean O'Riordain wrote:
> Thanks Fergal.
> 
> here is what is how i currently load from mysql...
> 
>     my $rec_count = 0;
>     while (my $aref = $sth->fetchrow_hashref) {
>         push @as, $aref->{timestamp_ue};
>         push @ae, $aref->{finish_ue};
>         push @adw, $aref->{dw};
>         push @ahh, $aref->{hh};
>         push @abz, $aref->{bzone};
>         $rec_count++;
>     }
>     print " $rec_count cdr records loaded\n";
> 
> this takes maybe 5 minutes - so i'm not overly worried about that...

Pre-extending the arrays (using $#as = N) would eliminate the memory
fragmentation you're getting from the reallocs as the arrays grow.
(Would also be faster, though you can't be very interested in speed
if you're using fetchrow_hashref :-)

> i could speed up the string stuff by using a lookup table since there 
> are only about 350 different values...

That's certainly worth doing (before getting into Inline::C).
Something like this should suffice:

    push @abz, \($cache{ $aref->{bzone} } ||= $aref->{bzone});

But if the data volume is likely to grow then you'll need to try
the pack/unpack approach, a tied DBM, or Inline::C. PDL might also
be worth a look.

Tim.



More information about the Dublin-pm mailing list