SPUG: general hash performance questions
dancerboy
dancerboy at strangelight.com
Wed Feb 27 22:14:19 CST 2002
At 3:25 pm -0800 2/27/02, Benjamin Franks wrote:
>I was just messing about with some test code to gauge execution times and
>came up with 2 general questions:
>
>(1) While iterating over the keys of a large hash, I notice speed
>increase if I do the following:
>
> foreach (keys %hash) {
> $hash{$_}++;
> }
>
>instead of doing:
>
> foreach (keys %hash) {
> if (exists $hash{$_}) {$hash{$_}++;}
> else {$hash{$_}=1;}
> }
>
>This intuitively makes sense because I'm cutting out the conditional
>statement and test for existence. But why does it work? If I increment a
>non-existent hash/key value, how does it get to (correctly) 1?
It's not clear from your question *exactly* which part of Perl's
behaviour you're asking about: there are several possible sources of
confusion here, so I'll just go over all of them. My apologies of
some of this you already know...
First, understand that
exists $hash{foo}
is not the same thing as
defined $hash{foo}
exists $hash{foo} checks to see whether %hash already has a key
'foo'. The *value* associated with that key could still be undefined.
defined $hash{foo} checks to see whether the *value* associated with
the key 'foo' is defined.
Also, understand that evaluating $hash{foo} will automatically create
an entry for key 'foo' in %hash (the only exception is when
$hash{foo} is the argument to the exists function). This can
sometimes create counter-intuitive results.
Consider this example:
undef %hash; # %hash now contains no keys or values
exists $hash{foo}; # evaluates to FALSE
defined $hash{foo}; # evaluates to FALSE
exists $hash{foo}; # now evaluates to TRUE because the previous
# line, in evaluating $hash{foo}, added
# key 'foo' to %hash
defined $hash{foo}; # still evaluates to FALSE, because the
# *value* associated with $hash{foo} is still
# undefined.
This is why you'll often see expressions like this in Perl code:
if ( exists $hash{$key} and $hash{$key} eq $some_specific_string ) { ...
In terms of boolean logic, this is redundant: obviously $hash{$key}
isn't going to equal $some_specific_string if $hash{$key} doesn't
even exist! But it's necessary to write the expression this way,
otherwise $hash{$key} will get created if it didn't exist already.
As for why $hash{foo}++ will turn an undefined value into 1, that's
just the way Perl handles "undefined" values: when an undefined
value is encountered in a numeric context, Perl interprets the
undefined value as zero.
If your question is: why is the second loop so much slower, when
it's doing essentially the same thing? Then you need to remember
that in the second loop, you're forcing Perl to do its existence test
*twice*: the expression $hash{$_}++ is implicitly testing for the
existence of the key $_. It doesn't matter that you just explicitly
did the same test yourself: the interpreter still needs to do the
test internally to know how to handle the expression $hash{$_}++.
>
>(3) And even though this is a perl list, can complex structures like hash
>tables of hash tables be done in C? Any sample code sources?
I'm pretty sure that Perl itself is written in C, so the answer is,
of course: yes, you can make any Perl data structures in C -- but it
takes a lot more work.
You might want to check out the STL (Standard Template Library) for
C++ -- the STL has templates for many different data structures,
including associative arrays (aka hashes). Here's one place to start:
http://www.cs.rpi.edu/projects/STL/htdocs/stl.html
A google search for "C++" and "STL" should turn up plenty more.
-jason
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://seattleperl.org
More information about the spug-list
mailing list