[ABE.pm] perl 5.10 rules! - field hashes

Ricardo SIGNES rjbs-perl-abe at lists.manxome.org
Fri May 25 07:25:10 PDT 2007


It's not that one post per day was too ambitious.  I've had time.  I just keep
forgetting!

Here's another awesome little something from perl 5.10:  field hashes.

If you've been keeping up with the ever-changing fads of Perl object design,
you will have heard of inside-out objects.  If you haven't, here's a too-short
primer:

--- BEGIN PRIMER ---

The most common way to make an object in Perl is to bless a hashref.  That
means you have a constructor that does something like this:

  sub new {
    my ($class, $host, $port) = @_;

    my $self = {
      host => $host,
      port => $port,
      created => time,
    };

    return bless $self => $class;
  }

Then you have methods to do things like see when an object was created:

  sub created {
    my ($self) = @_;
    return $self->{created};
  }

The problem is that anybody who feels like examining your objects can see that
they're hashes, and they can do something asinine, like this:

  $object->{created} = 0; # make the object seem really old!

Well, this might be useful now and then, in times of great need, but normally
it's a terrible idea.  After all, what if, in the future, the author of this
class changes the way that created times are stored?  You've just shot yourself
in the foot.

Many languages have "strong encapsulation" for objects -- you can only get at
the objects via their public methods, not via their private instance data.
Perl, not so much.  The famous quote from Tom C. is, roughly, "Perl wants you
to behave nicely because you're polite, not because it's got a shotgun."

Inside out objects give Perl a shotgun.

  my %created;
  my %host;
  my %port;

  sub new {
    my ($class, $host, $port) = @_;

    my $self = {}; # or some other reference; it doesn't matter

    $created{ $self + 0 } = time;
    $host   { $self + 0 } = $host;
    $port   { $self + 0 } = $port;

    return $self;
  }

  sub created {
    my ($self) = @_;

    return $created{ $self + 0 };
  }

Now, because the data is in a lexical variable and not in the object, nobody
outside the file that declares the class can get at an object's data.  The
object's internal data is outside the object, so we call the object
"inside-out."  The object's reference address (position in memory) is used as a
key to the hash.

Unfortunately, there are LOADS of problems with this: it doesn't work well
under threads, it requires careful manual garbage collection, and there are
other problems, as well.

--- END PRIMER ---

So, inside-out objects are a pain to use, but provide some real benefits (more,
too, than just the one I sketched out).  What's a Perl hacker to do?  Well,
hack perl itself, of course.  Hash::Util::FieldHash lets you declare a special
kind of hash, a fieldhash, which is built for taking care of the obnoxious
problems above.

If you try to use an object as a hash key in perl 5.8, you get the object's
stringy form instead.  So, if you do this:  $sender{ $email } = 'rjbs'

You will get an entry for something like: Email=HASH(0xDEADBEEF)

If the object is overloaded to stringify to something else, you could be in for
real problems.  What if your Email object stringified to the sender?  Now very
time you tried to use the object as a hash key, you'd replace the previous
entry for a distinct email from the same sender.

With a fieldhash, objects (or any reference) used as keys work properly: the
object's unique id is used instead.  If the process threads, the entry is
updated for the object's new id.  (Objects get new ids in threads.)  If the
object is garbage collected, the hash entry goes away.

I think that in a year or two after 5.10's release, we'll start seeing far more
inside-out objects as a result of Hash::Util::FieldHash.  For more on how
inside-out objects can make life easier, see Perl Best Practices,
Class::InsideOut, and Object::InsideOut.

-- 
rjbs


More information about the ABE-pm mailing list