SPUG: general hash performance questions

Matt Tucker tuck at whistlingfish.net
Wed Feb 27 20:27:51 CST 2002


-- Benjamin Franks <benjamin at dzhan.com> spake thusly:

> I was just messing about with some test code to gauge execution times
> and came up with 2 general questions:
> 
> (1)  While iterating over the keys of a large hash, I notice speed
> increase if I do the following:
> 
> 	foreach (keys %hash) {
> 		$hash{$_}++;
> 	}
> 
> instead of doing:
> 
> 	foreach (keys %hash) {
> 		if (exists $hash{$_}) {$hash{$_}++;}
> 		else {$hash{$_}=1;}
> 	}
> 
> This intuitively makes sense because I'm cutting out the conditional
> statement and test for existence.  But why does it work?  If I
> increment a non-existent hash/key value, how does it get to
> (correctly) 1?

Because empty strings and undefined values, when interpreted as
numbers, evaluate to 0. Doing $hash{$_}++ the first time gives you the
equivalent of $hash{$_} = undef + 1, which is interpreted as $hash{$_}
= 0 + 1.


> (2)  Let's say I build a hash of 1 million keys, each having a value
> of 1.
> 
> 		foreach (1..1000000) {
> 			$hash{$_}=1;
> 		}
> 
> If I watch the memory consumption with "top," I can see the memory
> footprint grow pretty big. However, if I make a complex data
> structure of hashes like:
> 
> 	foreach $a (1..100) {
> 		foreach $b (1..100) {
> 			foreach $c (1..100) {
> 				$hash{$a}{$b}{$c}=1;
> 				...
> 
> , I don't see the memory footprint grow nearly as large as for simpler
> case.  Why is this?  I understand "top" is probably not the greatest
> tool for this; can anyone recommend better perl memory profiling
> tools?

I suspect this is because hash keys are shared among all hashes, so
instead of having 1000000 strings representing the numbers 1 - 1000000,
you've got 100 representing the numbers 1 - 100.


> (3)  And even though this is a perl list, can complex structures like
> hash tables of hash tables be done in C?  Any sample code sources?

If I were going to do something like that, I'd use C++ and STL.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://mail.pm.org/archives/spug-list/attachments/20020227/0748d495/attachment.bin


More information about the spug-list mailing list