[Dub-pm] big data structures relative to memory size
Sean O'Riordain
seanpor at acm.org
Fri Apr 16 11:15:26 CDT 2004
Hi Fergal!
have to think about that one!
if i malloc memory in C then go back to perl to pass info into C
again... how do i reference that memory again - i presume i pass the
info back and forth by reference, - or could it be still available as a
static type pointer... i'm missing a step here...
thanks again!
Sean
Fergal Daly wrote:
> If you malloc data in the C it will stay malloced until you free it, whether
> you go back to Perl or not.
>
> Getting data in and out of C within Perl is pretty much the same as getting
> it into C on it's own, except that you can use bits of Perl too. You could
> try something a bit fancy and figure out exactly how much memory you'll be
> using and dump all the data into that file, exactly like a big chunk of
> memory, making a note of the offsets of where each array begins and how long
> it is etc. Then you can just mmap the file so that it appears as a chunk of
> memory and bingo, your data structures are loaded. the mmap call will give
> you the address that the file has been mapped to so you can get the address
> of the arrays inside it by using the offsets,
>
> F
>
> On Fri, Apr 16, 2004 at 04:54:02PM +0100, Sean O'Riordain wrote:
>
>>Thanks Fergal.
>>
>>here is what is how i currently load from mysql...
>>
>> my $rec_count = 0;
>> while (my $aref = $sth->fetchrow_hashref) {
>> push @as, $aref->{timestamp_ue};
>> push @ae, $aref->{finish_ue};
>> push @adw, $aref->{dw};
>> push @ahh, $aref->{hh};
>> push @abz, $aref->{bzone};
>> $rec_count++;
>> }
>> print " $rec_count cdr records loaded\n";
>>
>>this takes maybe 5 minutes - so i'm not overly worried about that...
>>
>>if there isn't a simple way of passing the info to inline::C, then i was
>>thinking of just re-writing all the info to disk in an easily parseable
>>format, ie fixed width columns. Then i was just going to do all the
>>integer work in C and writing the results to an output file...
>>(currently it takes more than 8 hours at 100%cpu on a 1700mhz athlon ...)
>>
>>i could speed up the string stuff by using a lookup table since there
>>are only about 350 different values...
>>
>>in inline::C is it possible to persistently keep a C data-structure
>>between calls ? ie malloc space for my large int arrays, and then from
>>perl append each new line of info ?
>>
>>cheers,
>>Sean
>>
>>
>>Fergal Daly wrote:
>>
>>>Not knowing exactly what you have makes it a bit tricky if you've got 5
>>>million things looking like
>>>
>>> [$int1, $int2, $int3, $int4, $int5, $string] x 1.5 million
>>>
>>>then you will save quite bit by having
>>>
>>>@int1s = (int x 1.5 million)
>>>@int2s = (int x 1.5 million)
>>>..
>>>@int5s = (int x 1.5 million)
>>>@strings = (string x 1.5 million)
>>>
>>>then just pass around the index. A package like
>>>
>>>package MyObj;
>>>
>>>sub new
>>>{
>>> my $pkg = shift;
>>> my $index = shift;
>>> return bless \$index, $pkg;
>>>}
>>>
>>>sub getInt1
>>>{
>>> my $self = shift;
>>> return $int1s[$$self];
>>>}
>>>
>>>etc...
>>>
>>>are you could get more mem efficient and rather than using arrays for the
>>>ints, have a string for each set of ints and have methods like
>>>
>>>sub getInt1
>>>{
>>> my $self = shift;
>>>
>>> # assume a 4 byte integer
>>> my $enc = substr($int1s, $$self*4, 4);
>>>
>>> return unpack("L", $enc);
>>>}
>>>
>>>you could also do this for the strings. It'll be slower because you'll be
>>>invoking methods, you could use subroutines if you're sure you'll never
>>>want
>>>inheritance etc.
>>>
>>>If you use Inline::C, how you load the data depends entirely on how you
>>>store
>>>it, you'll just have to write C routines for loading the data and call them
>>
>>>from Perl,
>>
>>>F
>>>On Fri, Apr 16, 2004 at 03:08:23PM +0100, Sean O'Riordain wrote:
>>>
>>>
>>>>Hi folks,
>>>>
>>>>I've an analysis program with a couple of million records that i really
>>>>need to keep in memory as i need to scan back and forth etc... With 5
>>>>million odd records (written as a couple of independent 'arrays' or
>>>>should i say 'lists') the program requires quite a bit more than the
>>>>1.5Gb of ram and becomes very slow due to swapping - gentoo-linux...
>>>>Each record has 5 integers and a string of max.len 30 chars... but perl
>>>>takes up extra ram for each SV... I would like to be able to handle
>>>>larger datasets much faster than currently...
>>>>
>>>>Has anybody used INLINE::C for handling large data structures - if so
>>>>how do you load the info?
>>>>
>>>>Anybody used PDL?
>>>>
>>>>Any thoughts which way I should jump?
>>>>
>>>>cheers,
>>>>Sean
>>>>_______________________________________________
>>>>Dublin-pm mailing list - Dublin-pm at mail.pm.org
>>>>http://dublin.pm.org/ - IRC irc.linux.ie #dublin-pm
>>>>
>>>>
>>>
>>>
>>>
>>_______________________________________________
>>Dublin-pm mailing list - Dublin-pm at mail.pm.org
>>http://dublin.pm.org/ - IRC irc.linux.ie #dublin-pm
>>
>>
More information about the Dublin-pm
mailing list