[pm-h] [SPAM] [PBML] complex data structure help
Paul Archer
tigger at io.com
Tue Mar 28 07:32:40 PST 2006
Thanks for the suggestions. I'm torn between using Perl structures (easier
in the short term) and a database (harder in the short term, but better for
long-term storage). Since we're planning on being able to store anywhere
from months to years worth of data, a database is probably my best bet.
Now I just gotta put on my (tiny, ill-fitting) DBA hat, and scratch out a
schema. 8-)
Paul
3:59am, Kevin Shaum wrote:
> On Monday 27 March 2006 5:13 pm, Paul Archer wrote:
>> I'm writing a log analyzer (a la Webalyzer) to analyze Solaris' nfslog
>> files. They're in the same format as wu-ftpd xferlog files. I'd use an
>> existing solution, but I can't find anything that keeps track of reads vs
>> writes, which is critical for us.
>> Anyway, I need to be able to sort by filesystem, client machine, user, time
>> (with a one-hour base period) read, write, or total usage.
>> Can anyone suggest a data structure (or pointers to same) that will allow
>> me to pull data out in an arbitrary fashion (ie users on X day sorted by
>> data written)?
>> Once I have the structure, I can deal with doing the reports, but I want to
>> make sure I don't shoot myself in the foot with the structure.
>>
>> I was thinking of a hash of hashes, where the keys are filesystems pointing
>> to hashes where the keys are client machines, etc, etc. But it seems that
>> approach would be inefficent for lookups based on times or users (for
>> example).
>
> The simplest thing to do would be to store it all as a simple list of
> (references to) lists, then 'grep' and 'sort' the big list as the query
> requires.
>
> @result = sort { $a->[1] lt $b->[1] }
> grep { $_->[2] >= $time0 and $_->[2] <= $time1 }
> grep { $_->[0] eq 'myhost' }
> @dataset;
>
> A more readable (but possibly less efficient) version would store each entry
> in the big list as (a reference to) a hash:
>
> @result = sort { $a->{username} lt $b->{username1} }
> grep { $_->{time} >= $time0 and $_->{time} < $time1 }
> grep { $_->{hostname} eq 'myhost' }
> @dataset;
>
> If the data set is large enough that that's not practical, then the suggestion
> to go to a relational database (e.g., SQLite) makes sense. But it sounds like
> you're thinking of keeping it all in RAM anyway.
>
> Hope this helps.
>
> Kevin
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
>
-----------------------------------------------
"Working with babies had its problems...
but then I tried working with chickens."
Jim Henson, talking about making "Labyrinth"
-----------------------------------------------
More information about the Houston
mailing list