[pm-h] [SPAM] [PBML] complex data structure help

Tue Mar 28 07:32:40 PST 2006

Thanks for the suggestions. I'm torn between using Perl structures (easier 
in the short term) and a database (harder in the short term, but better for 
long-term storage). Since we're planning on being able to store anywhere 
from months to years worth of data, a database is probably my best bet.
Now I just gotta put on my (tiny, ill-fitting) DBA hat, and scratch out a 
schema. 8-)

Paul

3:59am, Kevin Shaum wrote:

> On Monday 27 March 2006 5:13 pm, Paul Archer wrote:
>> I'm writing a log analyzer (a la Webalyzer) to analyze Solaris' nfslog
>> files. They're in the same format as wu-ftpd xferlog files. I'd use an
>> existing solution, but I can't find anything that keeps track of reads vs
>> writes, which is critical for us.
>> Anyway, I need to be able to sort by filesystem, client machine, user, time
>> (with a one-hour base period) read, write, or total usage.
>> Can anyone suggest a data structure (or pointers to same) that will allow
>> me to pull data out in an arbitrary fashion (ie users on X day sorted by
>> data written)?
>> Once I have the structure, I can deal with doing the reports, but I want to
>> make sure I don't shoot myself in the foot with the structure.
>>
>> I was thinking of a hash of hashes, where the keys are filesystems pointing
>> to hashes where the keys are client machines, etc, etc. But it seems that
>> approach would be inefficent for lookups based on times or users (for
>> example).
>
> The simplest thing to do would be to store it all as a simple list of
> (references to) lists, then 'grep' and 'sort' the big list as the query
> requires.
>
> @result = sort { $a->[1] lt $b->[1] }
>          grep { $_->[2] >= $time0 and $_->[2] <= $time1 }
>          grep { $_->[0] eq 'myhost' }
>          @dataset;
>
> A more readable (but possibly less efficient) version would store each entry
> in the big list as (a reference to) a hash:
>
> @result = sort { $a->{username} lt $b->{username1} }
>          grep { $_->{time} >= $time0 and $_->{time} < $time1 }
>          grep { $_->{hostname} eq 'myhost' }
>          @dataset;
>
> If the data set is large enough that that's not practical, then the suggestion
> to go to a relational database (e.g., SQLite) makes sense. But it sounds like
> you're thinking of keeping it all in RAM anyway.
>
> Hope this helps.
>
> Kevin
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
>

-----------------------------------------------
    "Working with babies had its problems...
     but then I tried working with chickens."
   Jim Henson, talking about making "Labyrinth"
-----------------------------------------------