SPUG: general hash performance questions

Wed Feb 27 22:14:19 CST 2002

At 3:25 pm -0800 2/27/02, Benjamin Franks wrote:
>I was just messing about with some test code to gauge execution times and
>came up with 2 general questions:
>
>(1)  While iterating over the keys of a large hash, I notice speed
>increase if I do the following:
>
>	foreach (keys %hash) {
>		$hash{$_}++;
>	}
>
>instead of doing:
>
>	foreach (keys %hash) {
>		if (exists $hash{$_}) {$hash{$_}++;}
>		else {$hash{$_}=1;}
>	}
>
>This intuitively makes sense because I'm cutting out the conditional
>statement and test for existence.  But why does it work?  If I increment a
>non-existent hash/key value, how does it get to (correctly) 1?

It's not clear from your question *exactly* which part of Perl's 
behaviour you're asking about: there are several possible sources of 
confusion here, so I'll just go over all of them.  My apologies of 
some of this you already know...

First, understand that
     exists $hash{foo}
is not the same thing as
     defined $hash{foo}

exists $hash{foo} checks to see whether %hash already has a key 
'foo'.  The *value* associated with that key could still be undefined.

defined $hash{foo} checks to see whether the *value* associated with 
the key 'foo' is defined.

Also, understand that evaluating $hash{foo} will automatically create 
an entry for key 'foo' in %hash (the only exception is when 
$hash{foo} is the argument to the exists function).  This can 
sometimes create counter-intuitive results.

Consider this example:

     undef %hash;         # %hash now contains no keys or values
     exists $hash{foo};   # evaluates to FALSE
     defined $hash{foo};  # evaluates to FALSE
     exists $hash{foo};   # now evaluates to TRUE because the previous
                          # line, in evaluating $hash{foo}, added
                          # key 'foo' to %hash
     defined $hash{foo};  # still evaluates to FALSE, because the
                          # *value* associated with $hash{foo} is still
                          # undefined.

This is why you'll often see expressions like this in Perl code:

     if ( exists $hash{$key} and $hash{$key} eq $some_specific_string ) { ...

In terms of boolean logic, this is redundant:  obviously $hash{$key} 
isn't going to equal $some_specific_string if $hash{$key} doesn't 
even exist!  But it's necessary to write the expression this way, 
otherwise $hash{$key} will get created if it didn't exist already.

As for why $hash{foo}++ will turn an undefined value into 1, that's 
just the way Perl handles "undefined" values:  when an undefined 
value is encountered in a numeric context, Perl interprets the 
undefined value as zero.

If your question is:  why is the second loop so much slower, when 
it's doing essentially the same thing?  Then you need to remember 
that in the second loop, you're forcing Perl to do its existence test 
*twice*: the expression $hash{$_}++ is implicitly testing for the 
existence of the key $_.  It doesn't matter that you just explicitly 
did the same test yourself: the interpreter still needs to do the 
test internally to know how to handle the expression $hash{$_}++.

>
>(3)  And even though this is a perl list, can complex structures like hash
>tables of hash tables be done in C?  Any sample code sources?

I'm pretty sure that Perl itself is written in C, so the answer is, 
of course: yes, you can make any Perl data structures in C -- but it 
takes a lot more work.

You might want to check out the STL (Standard Template Library) for 
C++ -- the STL has templates for many different data structures, 
including associative arrays (aka hashes).  Here's one place to start:

http://www.cs.rpi.edu/projects/STL/htdocs/stl.html

A google search for "C++" and "STL" should turn up plenty more.

-jason

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://seattleperl.org