[Dub-pm] big data structures relative to memory size
Tim Bunce
Tim.Bunce at pobox.com
Fri Apr 16 12:16:20 CDT 2004
On Fri, Apr 16, 2004 at 04:54:02PM +0100, Sean O'Riordain wrote:
> Thanks Fergal.
>
> here is what is how i currently load from mysql...
>
> my $rec_count = 0;
> while (my $aref = $sth->fetchrow_hashref) {
> push @as, $aref->{timestamp_ue};
> push @ae, $aref->{finish_ue};
> push @adw, $aref->{dw};
> push @ahh, $aref->{hh};
> push @abz, $aref->{bzone};
> $rec_count++;
> }
> print " $rec_count cdr records loaded\n";
>
> this takes maybe 5 minutes - so i'm not overly worried about that...
Pre-extending the arrays (using $#as = N) would eliminate the memory
fragmentation you're getting from the reallocs as the arrays grow.
(Would also be faster, though you can't be very interested in speed
if you're using fetchrow_hashref :-)
> i could speed up the string stuff by using a lookup table since there
> are only about 350 different values...
That's certainly worth doing (before getting into Inline::C).
Something like this should suffice:
push @abz, \($cache{ $aref->{bzone} } ||= $aref->{bzone});
But if the data volume is likely to grow then you'll need to try
the pack/unpack approach, a tied DBM, or Inline::C. PDL might also
be worth a look.
Tim.
More information about the Dublin-pm
mailing list