[Dub-pm] big data structures relative to memory size

Fergal Daly fergal at esatclear.ie
Fri Apr 16 09:27:33 CDT 2004


Not knowing exactly what you have makes it a bit tricky if you've got 5
million things looking like

	[$int1, $int2, $int3, $int4, $int5, $string]  x 1.5 million

then you will save quite bit by having

@int1s = (int x 1.5 million)
@int2s = (int x 1.5 million)
..
@int5s = (int x 1.5 million)
@strings = (string x 1.5 million)

then just pass around the index. A package like

package MyObj;

sub new
{
	my $pkg = shift;
	my $index = shift;
	return bless \$index, $pkg;
}

sub getInt1
{
	my $self = shift;
	return $int1s[$$self];
}

etc...

are you could get more mem efficient and rather than using arrays for the
ints, have a string for each set of ints and have methods like

sub getInt1
{
	my $self = shift;

	# assume a 4 byte integer
	my $enc = substr($int1s, $$self*4, 4);

	return unpack("L", $enc);
}

you could also do this for the strings. It'll be slower because you'll be
invoking methods, you could use subroutines if you're sure you'll never want
inheritance etc.

If you use Inline::C, how you load the data depends entirely on how you store
it, you'll just have to write C routines for loading the data and call them
from Perl,

F
On Fri, Apr 16, 2004 at 03:08:23PM +0100, Sean O'Riordain wrote:
> Hi folks,
> 
> I've an analysis program with a couple of million records that i really 
> need to keep in memory as i need to scan back and forth etc... With 5 
> million odd records (written as a couple of independent 'arrays' or 
> should i say 'lists') the program requires quite a bit more than the 
> 1.5Gb of ram and becomes very slow due to swapping - gentoo-linux... 
> Each record has 5 integers and a string of max.len 30 chars... but perl 
> takes up extra ram for each SV...  I would like to be able to handle 
> larger datasets much faster than currently...
> 
> Has anybody used INLINE::C for handling large data structures - if so 
> how do you load the info?
> 
> Anybody used PDL?
> 
> Any thoughts which way I should jump?
> 
> cheers,
> Sean
> _______________________________________________
> Dublin-pm mailing list - Dublin-pm at mail.pm.org
> http://dublin.pm.org/ - IRC irc.linux.ie #dublin-pm
> 
> 



More information about the Dublin-pm mailing list