[Phoenix-pm] Hash performance

Mon Jun 12 11:22:43 PDT 2006

Just finished my dbm test.  Not sure but seems to afford minor decrease
in load time.  I'll have to use Benchmark to be sure.  Anyway, the
memory usage has been cut by 1/3 which is perfect.

Thanks guys, should have just sucked it up and tried that too before
posting.

Bobby

-----Original Message-----
From: Scott Walters [mailto:scott at illogics.org]
Sent: Monday, June 12, 2006 9:39 AM
To: Metz, Bobby W, WWCS
Cc: Michael Friedman; phoenix-pm at pm.org
Subject: Re: [Phoenix-pm] Hash performance

Yeah.  I've got to go with Michael on this one.  You omitted an
important deal
that Michael picked up on... that you're sharing data by way of Perl's
parser
rather than any sort of database format.  Use a binary format, not text,
and
certainly not Perl source code as a text format.  Parsing code is a
thousand
times slower than navigating a binary format.  Use a dbm if nothing
else.
Michael suggested one in particular.

-scott

On  0, "Metz, Bobby W, WWCS" <bwmetz at att.com> wrote:
> Scott, 
> 	yes, single level hash only.
> 
> Michael,
> 	correct...#2 uses require to open a file generated by another
> script which basically looks like:
> 
> $hash{'key'} = 'value';
> 
> I had considered disk I/O but disregarded since the #1 method reads
the
> source file from disk to populate the hash, that is the same source
file
> the secondary script reads to generate the files read via require in
> method #2.  I guess there is the difference of a few characters per
line
> in the hash notation, but I wouldn't have thought that would nearly
> triple the memory usage to store the hash.
> 
> Thanks,
> 
> Bobby
> 
> 
> 
> -----Original Message-----
> From: Michael Friedman [mailto:friedman at highwire.stanford.edu]
> Sent: Friday, June 09, 2006 8:11 PM
> To: Metz, Bobby W, WWCS
> Cc: phoenix-pm at pm.org
> Subject: Re: [Phoenix-pm] Hash performance
> 
> 
> Wait -- for #2 it sounds like you build the hashes and then write  
> them out to a file (in some other script, I assume), followed by this

> script using 'require' to load the previously written files. Is that  
> right?
> 
> I would bet, then, that the extra memory and slowness comes from  
> accessing the filesystem. Once you require a file, perl basically eval

> ()s it into the current context -- doing just about exactly what you  
> do when you build the hash in the first place. :-) Scott could  
> probably explain the details, I'm just going on a guess.
> 
> Personally, I think you did the right thing by benchmarking it. Now  
> you know for sure which way is better and you can just rebuild it  
> when you use it.
> 
> If you want to save the hashes in a file, you might want to check out

> the GDBM, MLDBM, or TDB (a new really fast one) database modules.  
> (GDBM_File, MLDBM + Tie::MLDBM, TDB_File) They all tie to a hash and  
> let you manage the persistent storage without any effort whatsoever.
> 
> -- Mike
> 
> On Jun 9, 2006, at 5:38 PM, Metz, Bobby W, WWCS wrote:
> 
> > 	This is kind of a follow-up question to my multi-level hash
> > post.  Everything I've been reading on-line about how hashes work  
> > leads
> > me to conclusions that don't seem to pan out in reality, e.g.
> > pre-defining the # of hash buckets to increase performance on large

> > data
> > sets.  At least, I thought +40K records would be considered
large...no
> > jokes please.
> > 	So, here's what I've observed using two methods to load +40K
> > records into a single level hash.  I have always used method #1 as I
> > learned it that way years ago but would love some thoughts around
> > whether method #2 might be superior somehow as I know a lot of folks
> > that do it that way instead.
> >
> > Method 1
> > + Dynamically build hash from data file at run time.
> > + Program load is consistently 3 seconds faster than Method 2.
> > + Used 13M of memory to hold the records.
> >
> > Method 2
> > + Used pre-built hashes loaded via "require".
> > + Program load is consistently 3 seconds slower than Method 1.
> > + Used 36M of memory to hold the records.
> >
> > 	Any of you know the inner workings of hashes enough to explain
> > the difference?  I think the memory increase might have something  
> > to do
> > with "require" mucking with the usual shared hash table used by
perl,
> > possibly forcing two copies.  But, that's just an uneducated guess.
> > There was no discernable difference in output performance using a  
> > small
> > test set against the +40K records, only the initial program load and
> > total memory consumption.
> >
> > Thoughts?
> >
> > Thanks,
> >
> > Bobby
> > _______________________________________________
> > Phoenix-pm mailing list
> > Phoenix-pm at pm.org
> > http://mail.pm.org/mailman/listinfo/phoenix-pm
> 
> ---------------------------------------------------------------------
> Michael Friedman                     HighWire Press
> Phone: 650-725-1974                  Stanford University
> FAX:   270-721-8034                  <friedman at highwire.stanford.edu>
> ---------------------------------------------------------------------
> 
> 
> _______________________________________________
> Phoenix-pm mailing list
> Phoenix-pm at pm.org
> http://mail.pm.org/mailman/listinfo/phoenix-pm