[Phoenix-pm] Hash performance

Fri Jun 9 17:38:24 PDT 2006

	This is kind of a follow-up question to my multi-level hash
post.  Everything I've been reading on-line about how hashes work leads
me to conclusions that don't seem to pan out in reality, e.g.
pre-defining the # of hash buckets to increase performance on large data
sets.  At least, I thought +40K records would be considered large...no
jokes please.
	So, here's what I've observed using two methods to load +40K
records into a single level hash.  I have always used method #1 as I
learned it that way years ago but would love some thoughts around
whether method #2 might be superior somehow as I know a lot of folks
that do it that way instead.

Method 1
+ Dynamically build hash from data file at run time.
+ Program load is consistently 3 seconds faster than Method 2.
+ Used 13M of memory to hold the records.

Method 2
+ Used pre-built hashes loaded via "require".
+ Program load is consistently 3 seconds slower than Method 1.
+ Used 36M of memory to hold the records.

	Any of you know the inner workings of hashes enough to explain
the difference?  I think the memory increase might have something to do
with "require" mucking with the usual shared hash table used by perl,
possibly forcing two copies.  But, that's just an uneducated guess.
There was no discernable difference in output performance using a small
test set against the +40K records, only the initial program load and
total memory consumption.

Thoughts?

Thanks,

Bobby