[kw-pm] Pseudo Hashes (Was Re: slides)
Matt Sergeant
matt at sergeant.org
Fri Jun 17 07:36:11 PDT 2005
OK, so I'll repeat the pseudohashes thingy here:
Problem statement: You have an object as a hashref, but there's a
performance penalty for looking up values in a hash (compute the hash,
lookup the entry in the hash buckets, including cascading into overflow
buckets if required).
Now you can have your objects as blessed arrays no problem:
package Person;
use constant NAME => 0;
use constant AGE => 1;
use constant SEX => 2;
sub new {
bless [], shift;
}
And the access is fast, keyed off constants, and thus key-safe. But
this is truly horrible for subclassing - you have to remember to add
entries starting at 3, and if the base class gets extended it all
breaks really badly. Note this is how I wrote XML::XPath. XML::Twig
uses a similar scheme, but the source code uses hashrefs, and mirod
magically filters it all into arrayrefs when he runs Makefile.PL. Kinda
sick, but works.
So to solve this in a more elegant (*cough*) way, the perl5-porters
came up with the idea of pseudohashes. The basic premise is the
construction of the pseudohash hashref is really an arrayref that looks
like this:
[ { name => 1, age => 2, sex => 3 }, "John Doe", 22, "Male" ]
And when you treat it like a hash perl automatically does the right
thing in order to find your elements:
$self->{name} becomes $self->[$self->[0]{name}]
This isn't much of an optimisation (it's slower than an ordinary hash
lookup), so in order to make this really fast perl has to be able to
precompute the hash entries. You can do this with the fields pragma and
by specifying the class of the $self var.
package Person;
use fields qw(name age sex);
sub new {
my Person $self = shift;
$self = fields::new($self) unless ref($self);
return $self;
}
Now the important thing to remember is that everywhere you use $self
(i.e. in all your methods) you MUST declare it as "my Person $self"
otherwise you don't get the optimisation - you get the slow version.
This all sounds great, but what are the downsides?
- It's a mess. It's a really ugly hack. You can accidentally use $self
as an arrayref instead of a hashref and perl won't complain.
- You can't delete() keys. In fact doing so results in some really
weird bugs under older versions of perl. You have to undef() keys.
- exists() doesn't work right either.
- local $r->{key} doesn't work right either.
- Imagine you have $r->[ { 1 => \$y }, { 2 => \$z } ] (or something
like that). Now you accidentally type $r->{1} in your program instead
of $r->[1] - what happens? Perl thinks you want a pseudo hash. So it
looks up the key 1 in your 0'th element, finds \$y. Since it's a
reference, it's something like SCALAR(0x809f0c). But if you treat it as
a number it's 8429324. So perl happily goes and tries to look up an
array element nearly 8 and a half million entries away. And expands the
array to accomodate. Yow!
If you're careful they can actually give you a small performance boost
(I use them in the high_perf branch of qpsmtpd which is what we run on
our spamtrap), but if you're not careful they get really horrid really
fast :-)
More information about the kw-pm
mailing list