[Phoenix-pm] inside out objects

Mon Nov 28 11:45:17 PST 2005

Hi Michael,

Sorry for the slow reply. slowass.net was down for a while. Short
story long -- Google somewhere found links to //reverse.cgi and
//diff.cgi on http://perldesignpatterns.com. /reverse.cgi and /diff.cgi
are CPU-intensive, and the server is slow, so they're listed in
/robots.txt. But //reverse.cgi and /reverse.cgi are not the same
thing. So GoogleBot merrily beat the snot out of the server. This
wouldn't normally be a problem but I had just changed thttpd's
bandwidth allocation to be much higher, and I thought I changed
the CGI cost to match, but I made a math error (who said programmers
don't mean math? How many K*bytes* perseconds is a T1 again? Duh?).
So all of the hundreds of running CGIs runs the machine out of
memory and kills everything else, including inetd and sshd. 
Eventually pings stop coming back too. So, now my thttpd has been 
modded to watch the machine load average, and it's running with 
ulimits. So, I was without email for the entire holiday weekend.
Teh suck. 

Oh -- anoghter lesson for you all -- if your server ever crashes
while CVS has a lock out, and attempts to access the repository 
fail because of a lock, rm -rf .#* in the repisitory directory --
including the .#lock _directory_. Yes, the directory is a lockfile.
I had to rtfs for that bone.

Okay, as to your actual question...

Closures don't exactly copy methods. Hrm, this calls for some stats.
Creating 1000 subs using eval verus closures:

eval:     1088k on startup.... 3664k after execution. delta: 2576k
closures: 1100k on startup.... 1924k after execution. delta: 1814k

The code is below. This isn't a very good example -- normally
methods would have much more data in them, and eval would lose
spectacularly rather than badly.

Creating a closure certainly costs some memory... here, that's 
about 2.5K each. That includes, in this case:

* AV structure (arrayvalue -- just one to hold the pad for the 
  lexicals that each closure binds to)
* CV structure (codevalue -- one for each closure)
* GV structure (globvalue created when the closure is stored
  in the symbol table with *name = sub { ... } -- one for each)
* AV structure (padlist -- one for each)
* AV structure (pad -- one for each)
* SV/RV structure (scalar reference -- one for each variable
  closed by each closure)

If this were actually used to construct and object, you'd also
get:

* PVMG (blessed object)
* hash keys (shared string table)
* SV/RVs for hash values to references to the CV closures 

Off the top of my head, that accounts for close to 2k right there. 
I'm sure I'm forgetting a few things. (See perlguts illustrated
for more information on these datastructures.) This is pretty
low-fat, but ideally, the two AVs would be created per-object
rather than per-method in this arrangement. That would save
most of the overhead. 

Anyway, it isn't the "methods" that are being copied -- it's the
lexical environment that's constructed, for the most part, that
you're paying for. The actual bytecode is shared. The anonymous
syntax of sub { } returns a new reference, but that reference
has its own reference to the bytecode in question, though it has
different references to the padlist and such.

Padlists need some explanation here... each CV might be called 
recursively, so it doesn't just need a pad to hold lexical
variables (of its own and inherited, both) -- it needs a stack
of these, so if it happens to call itself, directly or indirectly,
the new invocation has its own private "scratchpad" for lexical
variables. So each code reference, no matter how generated,
has a its own stack, built-in. Worse, Perl doesn't free 
stack frames after the code has returned -- it's assumed that
if a method recurses to one depth once, it probably will again,
so it's best to keep the memory cached. In some situations, this
results in pathological memory usage. The Towers of Hanoi in
Perl is ugly, memory-usage wise.

Numbers are: after bytecode is compiled; after one object is created; after
1,000 objects are created; the delta from first and last numbers:

1216... 1228... 4156... 2940  the Person hashclosure example from below
1112... 1152... 1324...  212   normal object idiom (modified Person.pm example)

Here's that again with 100,000 objects created:

1112... 1152... 18100k
1216... 1228... 286m

Okay, that's a brutal beating. The normal one ran instantly; the second
one ground the harddrive for close to a minute (this system has 128
megs of RAM). The number just kept climbing and climbing...

But there are a few bad things about this example because the methods are so short:

1. The traditional style pays a tax every time a hash is indexed --
   the padhv instruction must reference an entry in the shared string
   table to get at the hash key, and it must reference the pad entry
   that represents the hash. That's two 32 bit quantities. This happens 
   every time you write $self->{whatever}. The closure version only
   does a padsv operation with one 32 bit index into the pad.
2. The pad/padlist overhead the closure version pays is constant,
   so it's more noticable with lots of instances of small methods.
   2.5k each overhead is a lot to pay for a small method but little
   for a large method. The small methods in this example make it seem
   really obnoxious, which it may well be... but the example doesn't
   help ;)

Conclusion: hashclosure is *fine* for programs with few object instances
and small programs =P 100's of objects is no made problem, but 1000's is.

I've been toying with the idea of making another little B hack.
This time, I want to make something like this work:

package Person;

use ourmeansinstance;  # this hypothetical module has not yet been rated.. er, named

our $this;   # our means instance now!
our $name;
our $age;
our @scars;

sub new {
    bless { }, $this;
}

sub name :lvalue { $name}
sub age :lvalue { $age }
sub get_older { $age++ }
sub new_scar { my $scar = shift; push @scars, $scar; }

This module would have to find the starts of all function/method definitions,
insert a few opcodes to shift @_ into $this, and then insert opcodes to
alias each instance variable ("our" variable) to $self->{whatever}. Eg,
$whatever would be aliased to $self->{whatever}. This would be done using
Data::Alias. Likewise, @scars could be aliased to $self->{scars}, and 
Data::Alias does the right thing.

Disadvantages: longer start-up times for programmers; more overhead for
method alls, especially in objects that have lots of instance fields.
I still haven't fixed B::Generate to build cleanly. 

-scott

On  0, Michael Friedman <friedman at highwire.stanford.edu> wrote:
> Scott,
> 
> Thanks! I'm curious, though, about the memory footprint that using  
> hashclosure would have. Since each method becomes a closure, I assume  
> that means that each method would end up being instantiated in a  
> brand new space in memory for each new object you create. Thus, if  
> you had 10 objects, you'd have 10 different copies of each method as  
> well as 10 copies of the field values. In either the single-hash- 
> based object or the inside-out object, the methods are shared between  
> all instances of the class, so they only have to go into memory once.
> 
> So if you were using 100s of objects, that memory could be  
> problematic. :-)
> 
> Still, it's a cool way to avoid having to use object syntax within  
> the object itself.
> 
> The link I included originally is to a new module, Class::Std, which  
> is supposed to make using inside out objects much easier. I hope to  
> try it out later this week, so I'll let everyone know how it goes.
> 
> -- Mike
> 
> 
> On Nov 23, 2005, at 9:06 AM, Scott Walters wrote:
> 
> > Hi Michael,
> >
> > I thought it was a cute trick, but I'm surprised it's recommended.
> > What I do depends on the situation -- the client, the style of the  
> > code
> > I'm adding to, etc.
> >
> > When I don't necessarily care about encapsulation, I use an excellent
> > package by Juerd called Attribute::Property:
> >
> >        package Person;
> >        use Attribute::Property;
> >
> >        sub new  : New;
> >        sub name : Property;
> >        sub age  : Property { /^\d+\z/ and $_ > 0 }
> >
> >        sub print_stats {
> >            my $self = shift;
> >            print "name: ", $self->name, "\n";
> >            print "age: ", $self->age, "\n";
> >        }
> >
> >        sub get_older {
> >            my $self = shift;
> >            $self->age++;
> >        }
> >
> >        package main;
> >
> >        my $person = Person->new(name => 'Fred', age => 23);
> >        $person->get_older;
> >        $person->name = "Fred Worth";
> >        $person->print_stats;
> >
> > A::P creates new methods for you that initialize instance variables  
> > from
> > arguments (just like in Perl 6 -- hence _Perl 6 Now_ including  
> > discussion
> > of it), and it also creates lvalue accessors for the instance data
> > when you use the :Property attribute of subroutines. It's a lot less
> > code to write and the code looks a lot better.
> >
> > For stuff I use internally, when I want some ecapsulation and not  
> > to have to
> > shift $this, I often use a little package called hashclosure.pm.  
> > hashclosure
> > basically tells Perl that the object's methods are code references  
> > inside
> > of the hash. The hash contains methods and code references rather than
> > instance data. Instance data is "my" variables lexically closed  
> > over by the
> > methods.
> >
> >   package hashclosure;
> >
> >   sub import {
> >       my $caller = caller;
> >       *{$caller.'::AUTOLOAD'} = sub {
> >           my $method = $AUTOLOAD; $method =~ s/.*:://;
> >           return if $method eq 'DESTROY';
> >           my $this = shift;
> >           local *{$caller.'::this'} = $this;
> >           if(! exists $this->{$method}) {
> >               my $super = "SUPER::$method";
> >               return $this->$super(@_);
> >           }
> >           $this->{$method}->(@_);
> >       };
> >   }
> >
> >   1;
> >
> > This is the AUTOLOAD glue needed to so that objects created as
> > follows work:
> >
> >   package Person;
> >   use hashclosure;
> >
> >   our $this;
> >
> >   sub new {
> >       my $class = shift;
> >       my $name;
> >       my $age;
> >       bless {
> >           name => sub { $name },
> >           set_name => sub { $name = shift },
> >           age  => sub { $age },
> >           set_age => sub { $age = shift },
> >           print_stats => sub {
> >               my $self = shift;
> >               print "name: ", $self->name, "\n";
> >               print "age: ", $self->age, "\n";
> >           },
> >           get_older => sub {
> >               my $self = shift;
> >               $self->age++;
> >           },
> >       }, $class;
> >   }
> >
> > Since the "instance data" is lexically closed over by the methods and
> > scoped to the new { } block, encapsulation is pretty good (PadWalker
> > and such can still get to it, but XS can do anything).
> >
> > Best of all, you don't have to write that annoying $self->{foo}  
> > crap --
> > just $foo will do. That's an improvement even over $self->foo as in  
> > A::P.
> >
> > The AUTOLOAD logic shifts $this for us and sticks it and puts it into
> > the $this defined by 'out $this'.
> >
> > Downsides are lack of lvalue accessors (so you can't do $person- 
> > >name = "Fred")
> > and the ugly initialization syntax.
> >
> > This might be workable with lvalue, but I'd have to do a much larger
> > and messier AUTOLOAD to make it happen and my first attempt didn't
> > pan out.
> >
> > At the last meeting, I got out my Object::Lexical on the projector and
> > shoved that in people's faces. It also makes instance data into
> > lexicals, but the implementation is completely different (it creates
> > a new stash for each object created, stuffs closures into it, and
> > blesses it -- it's probably the strangest thing I've ever done in  
> > Perl).
> > Here's what code using it looks like:
> >
> >   use Object::Lexical;
> >   use Sub::Lexical;
> >
> >   sub new {
> >
> >     our $this;
> >     my $name;
> >     my $age;
> >
> >     my sub age { $age };
> >     my sub name { $name };
> >
> >     sub print_stats {
> >         print "name: ", $name, "\n";
> >         print "age: ", $age, "\n";
> >     }
> >
> >     sub get_older {
> >         $age++;
> >     }
> >
> >     instance();
> >
> >   }
> >
> > It just doesn't get any more clear or concise than that for creating
> > objects in Perl. instance() serves the same purpose as bless(),
> > but it does the actual stash-blessing and closure-generating.
> > To get this pretty syntax, you need a source filter -- that's what
> > Sub::Lexical does. For some other idioms that don't use a source
> > filter, see http://search.cpan.org/~swalters/Object-Lexical-0.02/ 
> > Lexical.pm.
> > This module is a bit buggy, by the way, but if anyone actually
> > has any interest in it, I'll fix the thing.
> >
> > Regards,
> > -scott
> >
> > On  0, Michael Friedman <friedman at highwire.stanford.edu> wrote:
> >> So, I just read through _Perl Best Practices_ and it's fantastic.
> >> It's even well written. And I agree with almost all of Damian's
> >> recommendations for making more maintainable Perl code... all except
> >> inside out objects.
> >>
> >> However, as I've already had one tranformative religious experience
> >> from this book (I've changed my braces style), I'm willing to give
> >> the guy a chance on this one. But I need some more evidence.
> >>
> >> Has anyone used inside out objects before? I completely believe that
> >> they handle encapsulation much better, but what I don't buy is that
> >> they're actually easier to maintain and equivalently easy to
> >> understand as the "regular" hash-based objects are.
> >>
> >> So, anyone have real world advice on using them?
> >>
> >> -- Mike
> >>
> >> PS - For those not in the know, inside out objects are identified by
> >> a unique scalar that acts as an index into a list of hashes, one hash
> >> for each attribute/field of the object. The values are held in the
> >> set of hashes in a closure, so absolutely no one but the class itself
> >> can access them. There is an example in the code block on http://
> >> www.windley.com/archives/2005/08/best_practices.shtml, and other
> >> examples elsewhere that I can't seem to find at the moment. :-(
> >>
> >> ---------------------------------------------------------------------
> >> Michael Friedman                     HighWire Press
> >> Phone: 650-725-1974                  Stanford University
> >> FAX:   270-721-8034                  <friedman at highwire.stanford.edu>
> >> ---------------------------------------------------------------------
> >>
> >>
> >> _______________________________________________
> >> Phoenix-pm mailing list
> >> Phoenix-pm at pm.org
> >> http://mail.pm.org/mailman/listinfo/phoenix-pm
> 
> ---------------------------------------------------------------------
> Michael Friedman                     HighWire Press
> Phone: 650-725-1974                  Stanford University
> FAX:   270-721-8034                  <friedman at highwire.stanford.edu>
> ---------------------------------------------------------------------
> 
> 
> _______________________________________________
> Phoenix-pm mailing list
> Phoenix-pm at pm.org
> http://mail.pm.org/mailman/listinfo/phoenix-pm