[Kc] what I submitted for consideration for the current round of TPF grants

David Nicol davidnicol at gmail.com
Tue Feb 8 16:12:19 PST 2005


This is what I had been planning to bring along to the meeting tonight,
so here it is on the list :)

Since sending this in a week ago, I keep being haunted by thoughts of
ways to improve it, like a way to associate a new method or methods with
a pre-existing back half or pool.  I want to not think about it -- at
all -- until
the grant is approved.

It's possible that having a grants committee may actually stifle innovation of
some types, as effort goes into polishing a proposal that otherwise
might go into
developing working systems.

Anyone for proposing to the TPF grant committee a framework for rewarding work
that has been done already?




--->   Title

Asynchronous pragma

--->   Synopsis


As Uri Guttman has said, "All blocking ops can run async by passing a
message to a blocking process."(1)  The Asynchronous pragma arranges this
with no configuration needed, on a per-namespace basis.

To simplify writing non-blocking programs in Perl, the proposed
asynchronous pragma wraps arbitrary object-oriented perl modules
with a non-blocking message passing interface.

Instead of waiting for the result of a method call on an object blessed into the
wrapped package, a placeholder object is returned immediately. Placeholders
get upgraded to scalar result values, or when references are returned,
continue to
represent "back half" objects.

Methods that return objects have their objects continue to live in the
"back half"
and methods called against proxy objects in the "front half" are
passed to the back
half. When a method is called on a proxy object that is still waiting
for the result
of an earlier method, the new method is added to that object's method
queue. Objects'
callback functions are run and their ready() methods return true only after all
methods have completed.


--->   Benefits to the Perl Community


Writing non-blocking servers in Perl will become easier. This
pragma takes care of the details of creating message-passing asynchronous
abstraction layers.

The asynchronous pragma hides the details of creating message queues behind a
general interface that is both easier to get right and harder to get wrong than
the problem-specific engineering which is currently required when modifying an
existing process to defer delivery of its result. 

The set of methods used by the asynchronous pragma may become a standard set of
methods to access objects holding deferred results, providing a reference
implementation of delivering deferred results.

Module authors wishing to support asynchonous modes for their modules
will be able
to modify the back-half code provided in the asynchronous pragma to deliver
asynchronous modules adhering to the interface standard.

Server authors wishing to write selecting servers will have yet another
framework for doing so.

--->   Deliverables

Three modules, "asynchronous", "asynchronous::threaded" and
"asynchronous::threaded::full",
will be written and uploaded to CPAN, so that any object-oriented
module, such as DBI, can
be invoked with

  use asynchronous DBI;

rather than

  use DBI;

to create a message-passing asynchronous layer between the main execution
thread of the program and the back half in which the loaded module exists.
(this module invocation syntax has been demonstrated in the "aliased" module)

"asynchonous" uses socketpairs and forks to create the communications and
back-end.  

"asynchronous::threaded" uses ithreads, shared arrays, and queues
for decoupling operations in the back half and communications.

"asynchronous::threaded::full" maintains a thread in the front half
for locking and updating deferred proxy objects instead of requiring
calls to ready(), although they will still be reccommended.  This version 
is not altered by appearance of the MODE => FULL modifier. Emulated FULL
operation, using a signal, in the threaded version, is not known if it
will work or not at this time.

The following example code from the LWP documentation could be made
asynchronous as follows:

( before)

 # Create a user agent object
 use LWP::UserAgent;
 $ua = LWP::UserAgent->new;
 $ua->agent("MyApp/0.1 ");

 # Create a request
 my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
 $req->content_type('application/x-www-form-urlencoded');
 $req->content('query=libwww-perl&mode=dist');

 # Pass request to the user agent and get a response back
 my $res = $ua->request($req);

 # Check the outcome of the response
 if ($res->is_success) {
     print $res->content;
 }
 else {
     print $res->status_line, "\n";
 }

( after)

 use HTTP::Request; # we will create this in the front half,
                    # see below for specifying additional namespaces
                    # should be associated with a back-half
 # Create a message passing system that will intercept method calls
 use asynchronous::threaded LWP::UserAgent;
 

             # the advantages of wrapping the module rather than
             # simply spawning a thread to run the request are:
             # using the asynchronous wrapper pragma hides the
             # wrapped module behind a well-defined message passing
             # interface, so (1) danger of bugs due to multiple simultaneous
             # method calls into non-thread-compliant code is mitigated
             # and (2) issues of data sharing are different

  # Create an (asynchronous) user agent object
  $ua = LWP::UserAgent->new;
  $ua->agent("MyApp/0.1 ");

  # Create a request
  my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
  $req->content_type('application/x-www-form-urlencoded');
  $req->content('query=libwww-perl&mode=dist');

  # Pass request to the user agent and get a response back
  my $res = $ua->request($req);

              # at this point, $req lives in the front half, $ua
              # lives in the back half, and $res is a blessed
              # reference to an object of type asynchronous::threaded::object,
              # as is $ua.  Two methods have been enqueued for $ua. 
              

  my $success_value= $res->is_success;
              # instead of waiting for result to complete, we have to check it,
              # with a " yield until $success_value->ready: " we can
              # set a callback to be performed when the object becomes ready:
  $success_value->set_callback( sub {
   # Check the outcome of the response
   if ($success_value) {
      my $content = $res->content;
      $content->set_callback( sub { print $content; })
   }
   else {
      my $status_line = $res->status_line;
      $status_line->set_callback( sub { print $status_line, "\n"})
   }
  };);

  # to this point we've queued up a lot of deferred methods
  ...
  1 until asynchronous->ready(0.1); # wait until everything's resolved,
                                    # timing out each check after 100 ms.

For comparison, here's how to modify that example to use threads:

 # Create a user agent object
 use LWP::UserAgent;
 $ua = LWP::UserAgent->new;
 $ua->agent("MyApp/0.1 ");

 async{
  # Create a request
  my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
  $req->content_type('application/x-www-form-urlencoded');
  $req->content('query=libwww-perl&mode=dist');

  # Pass request to the user agent and get a response back
  my $res = $ua->request($req);
 
  # Check the outcome of the response
  if ($res->is_success) {
      print $res->content;
  }
  else {
      print $res->status_line, "\n";
  }
 }

For simply firing off an asynchronous routine, a simple async is the way to go.
When we want to use the results outside of the thread, more complex mechanisms,
such as declaring the result variable shared and checking the status of the
worker thread, are required. These more complex mechanisms are abstracted away
by the asynchronous pragma, replaced by a universally applicable ready() method
which performs i/o, updates variables as needed, and returns the readiness state
of the object it was called on.


Additionally, at least one simple DBI extension, possibly called
"DBIx::asynchronous" will be
provided.  This extension will operate similarly to wrapping DBI and working
with wrapped DBI objects, but will attempt to load packages from the
DBIx::asynchonous::driver::XXX and
DBIx::asynchonous::driver::backhalf::XXX namespaces,
where XXX is the initial database signifier in a DBI DSN, to allow provision of
database-specific asynchronous back-halves that can do tricks such as
out-of-order data return, or complete replacement of the asynchronous
wrapper system
with a DBI-compliant asynchronous interface. No examples of either
type of driver
are contemplated at this time.


An object-oriented interface is available for asynchonous wrapping.  In order to
provide multiple wrapped modules in the same program, each wrapping produces
an object in which is kept the per-wrapping data, such as the file handles
and queues and buffers and object tables.  A "new" method is provided that
is essentially the same as the "import" method.  By using the OO interface,
it will be possible to remove a wrapping instancefrom the list of
wrappings refered
to by the global asynchronous->ready() method, which one might do to prioritize
different modules differently, or when using the communications system directly
without wrapping a name space.


Emulating fully asynchronous mode using signals

Fully asynchronous mode can be emulated in the forking asynchronous wrapper by
having back halves send the front half a signal after enqueueing a
response.  The
front half maintains a list of wrapper objects that are associated with each
signal (defaults to SIGUSR1) and performs READ operations, which can
upgrade deferred
objects, on the wrapper objects associated with the signal.  This way, deferred
asynchronous wrappers, which are not prone to having objects suddenly
change without
a ready() call being made, and fully asynchronous wrappers, which do not require
ready() to be called, can co-exist.  Garbage collection management
instructions will
not generate the signals. A signal might be sent the first time the back-half
experiences a select timeout after there has been some activity.


back half pools

a pool size can be specified for a wrapper, in which that many
identical back halves
are created.  Each proxied object is associated with the back half in which it
lives, and all method calls on that object are queued to the correct back half.

methods on scalars rather than objects, such as

       my $QBsneak = AmericanFootballPlay->new(23,47,82,"hike!");

will be given to the back half instances in strict rotation.  If the
AmericanFootballPlay package maintains information about the game state
in package variables, only a pool size of 1 would be appropriate.  If
on the other hand the package uses objects for all state information 
about a game, like

       my $game = AmericanFootballGame->new('Patriots','Eagles');
       ...
       my $QBsneak = $game->attempt(23,47,82,"hike!");

this could be wrapped with any number of pools: each game object
will live in a single back half instance and all methods referring to that
object will be directed to that instance.


contrast with "lazy variables"

A "lazy" variable, as proposed to the perl 6 discussions five years ago,
defers evaluation of its contents as long as possible.  The deferred
objects discussed in this proposal evaluate as quickly as possible, but
in a program control flow separate from the one in which the method
being evaluated is called.


--->   Project Details

The asynchronous pragma wraps a package in a layer that provides immediate
return from all method calls.  Methods germane to the asynchronous
layer are handled directly, all other methods are queued for communication to
the synchronous back half.

Suppose we have a module called PokeyModule that does some heavy
calculations and takes an average of ten seconds to return from a
method call.  We could wrap PokeyModule like so:

    use asynchronous PokeyModule; #invocation style taken from "aliased"
    my $DeferredPokeyObject = new PokeyModule;

all method calls, and all subroutine calls within the PokeyModule:: package,
immediately return asynchonous::object objects.  When a method returns a
reference, the front half maintains a one-to-one mapping between
asynchonous::object objects and back half objects, and ties the front
half object in such a way that direct object member accesses will be passed
to the back half for resolution.

When a method returns a scalar, the asynchronous::object transforms itself
to the value when the value arrives.

Code references passed as arguments, or returned as method results, are
wrapped in coderef proxy objects which, when invoked, become method messages
passed to side where the coderef originated.

asynchronous installs the front half into the name space matching the
package name,
and other name spaces that appear when the named module(s) is(are) used in the
back half.  A ${${NameSpace}::Asynchronous::Deferred} variable is set to
a true reference to the front half object handling this name space, in all 
wrapped name spaces.

At import time, a census is taken of name spaces before and after the wrapped
package(s) is(are) used, so proxies can be set up in the front half for all
new name spaces, similar to the operation of "Pollute."

when an ALIAS_TO import parameter is provided, asynchronous installs the front
half into the named name space instead, and does nor perform the
namespace census.

A BACKHALF_INIT import parameter can be provided which will be eval'd in the
back half, after the fork, before using the target module.

an ARGS parameter specifies the import args given to the back half, if
it is missing the args given to asynchronous are passed to the back half.

Polling async objects can be done with their ready() method

a wait() method, analogous to the result() method from DJB's Async module,
may be provided as syntactic sugar, but is not needed, as tight loops that
check ready() are more flexible.

any number of callbacks and method calls can be stacked on an async object
and the object will not be ready until the last one has completed.

the ready() method can take a parameter which will be used as the timeout
in the select() call which occurs at the beginning of a (nonthreaded) ready.

threaded asynchronous is a special case which may not be as robust as forking
asynchronous.

for integration into coder-provided select() loops, fdnum and fdmask methods
are provided, as well as READ, WRITE  methods that do communication
to the back half, and a READY method that does not have the side effect of
promoting an async object to the return value when there is a return value.

when references are the result of methods, the async object will be promoted
to a reference to an object of the same kind of perl data, tied in
such a way that stores and fetches et cetera are sent to the back end.

a set_callback($) method can be used to associate a coderef with an unready
asynchronous::object object, which will run when the object becomes
ready. Callbacks are executed immediately when set on ready objects or
non-asynchronous objects, through

   sub UNIVERSAL::set_callback{
        my ($coderef) = splice (@_,1,1);
        goto &$coderef; # $_[1] has been spliced out
   }

   sub UNIVERSAL::ready{

       #avoid loop on self-tied objects
       if(tied($_[0])){
         my $noloop = $_[2] || {};
         $noloop->{ tied($_[0]) } and return 1;
         $noloop->{ tied($_[0]) } = 1 ;   
         return tied($_[0])->ready(undef, $noloop); 
       };                                          
       1;
   }

The asynchronous pragma offers a reference implementation of an asynchronous
interface to a perl module.  The default back-half provided the
asynchronous pragma provides results in the order they are requested,
but the asynchronous interface front half / back half convention will
be easy to adapt to applications where the back half results may
return out of order, such as an asynchronous version of LWP.

Support for tie interfaces:

Tie interfaces have a few methods, plus a "tied" method that allows method calls
to the underlying object.  When a tie package is wrapped, the underlying object,
being returned from the back half's TIE* method, will be an asynchronous::object
object, which will send all method calls to the back half for processing.

Support for XS modules:

Asynchronous placeholders use an AUTOLOAD method to identify new methods
as they are called on objects.  The calling convention in the back half is
not altered at all, so the asynchronous system does not care what goes on
"under the hood" of a wrapped module, with the normal caveats applicable
to pool sizes greater than one.

Support for external select loops:

A set of accessors pertaining to file descriptors used is available,
also a mechanism for calling the reading and writing routines on a
per-file-descriptor basis.

Support for incorporating additional file descriptors into Asynchronous's
select loop:

an asynchronous->add_fd($fd, \&read, \&write, \&error) method is available,
also set_read, set_write, set_error methods for setting the coderefs responsible
for handling a particular fd, and asynchronous->close_fd($fd) method to forgets
a mapping.

Support for DBI:

DBI, although pretty involved, provides a straightforward system of objects
returned from method calls.  The asynchronous front half / back half system will
accomodate DBI.

Asynchronous-wrapped DBI will work like any other wrapped module.

A DBIx::asynchronous package will be provided that will wrap a DBI
database handle object within the asynchronous layer rather than
wrapping the entire DBI system. The two invocations

       use asynchronous DBI;
       $dbh = DBI->connect("$dsn", $username, $password, \%attrs);

and

       use DBIx::async;
       $dbh = DBIx::async->connect("$dsn", $username, $password, \%attrs);

will be very nearly semantically equivalent.

If and when asynchronous back halves customized for the purposes of particular
databases that offer out-of-order response or other asynchronous modes become
available, the DBIx::async driver will know to look for and load those installed
back halves or entire asynchronous systems rather than wrapping the
standard handle
objects.

No accomodation for asynchronous operation need be made to standard DBI driver
systems, and no modification to the DBI standard is required.

The asynchronous wrapper will provide an in-order deferred
asynchronous wrapper around a synchronous module, providing a reference
implementation of a set of methods for accessing asynchronous data that
can be adapted to other asynchronous wrapper implementations and fully
asynchonous implementations.

The default back half operates in a child process, reading from the
communication socket, performing method calls, maintaining a table of objects,
returning results of method calls to the front half.

One object table is shared between the front and back halves.  At request time,
the front half provides an object table index number that is used to associate
the result of the request with its response. When an asynchronous object in the
front half is destroyed, a destruction sequence involving several messages is
initiated to safely identify the dead object so it can be cleared from
the object
tables.

No support is provided for propagating unforseen side effects to the front half.
Package variables available in the wrapped package immediately after
the package has been used into shared objects become tied proxy objects.
Support will be available for explicitly naming package variables to make
shared between the front half and a back half, and synchronization issues
can be mitigated by calling the ready() method on the variables in question.

Perl's closures provide adequate means of encapsulating current lexical
variables in callbacks.

Working examples will be provided, using the self-contained pure perl
asynchronous http::server::singlethreaded web server as an example application.

internal communications protocol:

All requests and responses are encoded with a method similar to
netstrings or bencoding. An option will be available to use Storable::freeze
instead, depending on early feedback it may become the default.

     BEGIN{ if ($Asynchronous::backhalf::storable){
          eval <<'EOB'
              sub encode_scalar($){ 
                 my $F = nfreeze($_[0]);
                 sprintf "S%d:$s", length($F), $F 
              }
          EOB
       }else{
          eval <<'EOB'
              sub encode_scalar($){ sprintf "s%d:$s", length($_[0]), $_[0] }
          EOB
    }; #BEGIN

Requests and responses are each exactly one netstring, allowing the internal
communications layer in each half to pass only completely received requests and
responses up to higher levels. The threaded version(s) will avoid some of the
serialization overhead, replacing it with the overheads in thread::queue.

The front half makes requests and the back half replies with responses.

Method requests begin with a request index number, then a letter S A
or V indicating the context of the method call, followed by the index number
of the object on which the method is being called, followed by the name of the
method as a netstring, followed by a series of aencoded scalars or objects.

aencoding is similar to bittorrent bencoding, except that the requirement that
hash keys are provided in order is relaxed and a new data type, the object, is
provided.

A new primitive is added to bencoding, 'o' for object and 't' for temporary
object originating in the back half and waiting for an entry in the front
half's object table.

Perl's undef is encoded as object zero, or o0.

References to indexed objects are encoded as "oNNNN:" where o is a literal
lower case o and NNNN is the index of the object in the tracked object table.

Responses begin with the request index number that is being replied to, followed
by the result. if a back half method throws an error (dies or croaks),
the result
begins with a capital E followed by the $@ thrown by the method.

methods called in void context result in very short responses: just
the request number, unless there is an error.

methods called in scalar context result in responses containing either
a netstring containing the stringified result or an object, either the a new
object indicated by oNNNNX where NNNN is the request index number and X is
either S, A, H or F indicating that the underlying data type of the
new object is a
reference to a scalar, array, hash or filehandle; or a pre-existing object
indicated by oNNNN where NNNN indicates the object number of  the
pre-existing object.

Object specifiers can be followed by an equal sign and an
initialization value instead
of a colon. The depth to which to pass initialization data is
specifiable and defaults
to 1, so a return value of [2,3,[4,5],6] in response to request 987 in
scalar context
might by default get encoded as

     987o987A=l1:21:35:t53A:1:6e

which would be responded to immediately by mapping t53 to o988 (assuming no
intervening methods), and then requests for array elements from o988 would
be sent to the back-half for resolution.

When depth is set higher, more data is returned, at the risk of
becoming unsynchronized
with back-half data.

     987o987A=l1:21:313:t53A=l1:41:5e1:6e

For safety at the cost of more communication, depth can be set to zero,
in which case the response to request 987 would be

     987o987A:

and all accesses into object 987 are deferred into their own remote fetch calls.

Essentially, the asynchronous pragma re-implements forks::shared for purposes
of synchronizing shared data between the front and back half, with
slightly different
synchronization semantics.  By default, we offer write-through
caching, with remote
objects reinitialized when they are passed forward in their entirety.
There will be
modifiers available to alter the data synchronization semantics.

In general, perl method calls effectively pass values by value even though the
language allows passing by reference.  Method arguments that are references
that are not tied, to methods that are not explicitly listed as using pass by
reference, are expanded into aencoded lists and dictionaries instead of getting
passed as objects. (aencoding may use Storable or might not.) A modifier is
available to use pass-by-value always.  Which way the default settings are
will be deferred until after testing and user feedback.

methods called in array context result in responses containing a
series of scalars,
either netstrings, complex values, or objects. An array result may
contain more than
one new object, accomodated by the object specifier using letter 't'
for temporary
rather than 'o' for object, and a number that is assigned in the back half.

on receiving a t object, the front half assigns a new object index number from
the main table and queues a mapping request of the form MNNNN:NNN where
M indicates Mapping, NNNN is the front-half index number, and NNN is the
temporary index number from the back half that is being indexed as NNNN.  The
front half maintains the temporary mapping entry until receiving a response for
the mapping request.

When a mapped object is destroyed, due to all references to local objects being
weakened, a destroy request of the form NNNN:DNNNN is sent. When the destroy
sequence (see below) is complete, the table entry can be reused. 

"t" objects cannot be destroyed until their assignment of front half numbers has
been acked by the back half, due to the t table references not being weakened.

Objects originating in the front side and passed to methods as
objects, the first
time they are passed, have their types indicated in the same way that back half
objects indicate their types.  A second channel is used for the back half to
query the front half about objects that live in the front side, using the same
request and response protocol.

memory management and proxy tables:
The front half is responsible for assigning object numbers.  The side
that did not originate an object maintains a proxy object for each
object that is
referred to in communications.  The originating side, which can be either front
or back, of an object, keeps a strong link to the object in its object table,
until being informed that the receiving side has deleted its proxy object, at
which time the entry in the object table is recycled and the link is removed.
The receiving side keeps weakened references to proxy objects in its
object table,
so that when a proxy object goes out of scope a Destroy request can be sent back
to the originating side to indicate that the link in the originating
side's object
table can be removed.  A table of unacked Destroy requests is
maintained, so that
if a new reference to a received object that was slated for destruction appears,
the existing proxy object can still be used.

sub asynchronous::object::DESTROY{
       my $memloc = refaddr $_[0]; # Scalar::Util
       $Limbo{$memloc} = $_[0]; #not weakened
       EnqueueRequest "D$IndexByMemloc{$memloc}";
};

The situation the Limbo table is meant to avoid is as follows:

Normally, the originating side responds to a request with a reference to object
OBJ.  The receiving side does something with it (say, calls its value method,
prints the result, then throws it away.) and then requests another method that
results in return of the same object.  The originating side would receive and
act on the Destroy request before receiving and acting on the second
method call,
so there is no problem.

The Limbo table would be needed if the side that originated an object issues a
second result containing the object based on a second method call that was
queued and waiting when the Destroy was requested.

Request number reuse is being left out for clarity in the Destroy problem:

        FRONT                                     BACK

    method request 1
                                        response to 1: object 1
     method request 2
    method request 3
     method request 4
                                        response to 2: string "hello world"
     delete request 5:  object 1
                                        response to 3: object 1 (again)
     restore request 6: object 1
                                        response to 4: object 4
     method request 7
                                        response to 5:
                                object 1 provisionally deleted from table.
     method request 8

it seems that there are two possibilities.  One is to require all
response objects to use the request codes of the request they are
responses to as their object numbers, even when returning an alias.
  The second possibility is to make the proxy object destruction process
a dragged-out procedure (similar to closing a TCP connection, in that it
is a dragged-out procedure requiring far more interaction than seems
necessary without actually trying to improve it without introducing new
problems):

The receiving side requests deletion when it has no more non-interface
references to a proxy object, but does not recycle the object slot,

                                                 issues object
             delete object
                                                 issues object again
             restore object
                                                 acks deletion
                                                 acks restore
             delete object
                                                 acks deletion
             purge object
                                                 acks purge

until the deletion gets acked without the object being returned again.  At
that point the receiving side can issue a Purge request WRT the object, and
bless the proxy into something other than asynchronous::object, and delete
the slot in the limbo (aka zombie) table, and recycle the object number.

On the originating side of an object, when you receive a D request for an
object in the table, you don't want to issue that number for that
object any more -- issue a new object number for that object, but
keep the association between number and object until a purge request
comes for that number. A restore request returns the object to where
its earlier number can be given out again.

That brings the kinds of requests up to:

Void context method call
Scalar context method call
Array context method call
Tied value data access
Map temporary object number to front-half-controlled object number
Destroy
Restore
Purge

and the kind of data that can be in both request parameters and
responses to them:

netstrings (scalars)
literal lists
literal associative arrays
proxies that can refer to hashrefs, arrayrefs, scalarrefs
proxies representing tied scalars and return-by-reference parameters

if a module uses return-by-reference extensively, an import parameter
RETURN_BY_REFERENCE_ALWAYS is provided, when set to true, all
requests will pass proxy objects even for plain scalars, requiring the back
half to query the front half over the second channel for their values, or to
set them if needed.

A list of package names is provided, not just one package name, in
the INSTALL_TO import parameter, when the back half is a complex system
of objects.

The data structures pertaining to a communication channel with a back half
live as a wrapping instance object, so multiple packages can be each
asynchronized with their own independent channels. The asynchronous->fdnum() and
fdmask() methods will return a list of all file descriptor numbers
involved in asynchronous operations on all channels, and the WRITE($)
and READ($) methods in the asynchronous package will know which package,
or which channel object,  is using a particular fd.  All communication
buffers will
live in the instance-specific objects.

Data are represented within requests and responses using netstring encoding.


Names of parameters are subject to change.


Unready objects are of type asynchonous::object::H::D::T where H refers to
what half they are in, D refers to if an object originated in this half or
the other half (disposition is Local or Remote) and T is type, which can be
Regular, Temporary, or Zombie. (see DESTROY PROTOCOL section)

Attempts to print out, or work with in other ways, objects that have not yet
achieved readiness will result in printing the reference signature.  

Back-half objects are represented in the front half
as blessed references to tied objects, so that direct access to object member
data will operate as just another passed and deferred method call, immediately
returning a proxy object which will have a false ready() method until
it upgrades to the returned value. (this is a similar approach to Thread::Tie.)

Not all the types are present in threaded asynchronous, due to the simpler
management of a single (per use) shared proxy object table and the :shared
attribute layer taking care of visibility and garbage collection issues.

In terms of asynchonous theory, the forking implementation is
"deferred-asynchronous" since the front half must explicitly receive
some cycles in order to work with the queues.  The threaded implementation
will be deferred-asynchronous as well. An experimental fully-asynchronous
version creates two threads, one to handle the back half and the
other to handle the receiving portion of the front half and upgrade
proxy variables immediately rather than waiting for the main thread to
call ready().
The fully-asynchronous threaded version will limited by the details of
shared data
in ways that are not fully understood at this time, and the full mode in the
standard version emulates fully asynchronous operation by signalling the
front half after the back half has issued a response, or when the back half
has issued a request on the auxillairy channel.

any number of modules can be wrapped, in which case each gets their own
front half state object and their own thread(s) or process(es) to handle
their back half(ves). 

A "POOLSIZE" parameter governs creation of multiple back-halves
associated with a single front-half. Proxied objects live in only one back-half,
so all methods on one object will be handled by the same back-half.  Bareword
method calls (such as C< $obj = new MyObject > ) are issued to worker threads
(back halves) in strict rotation, although a modifier to issue them to
the back-half
with the fewest objects in it, or the one with the fewest outstanding
method calls,
will be available.

To avoid deadlocks, back-half access to front-half objects is
accomplished through
the auxilliary queue.  In the LWP example, 

  # Pass request to the user agent and get a response back
  my $res = $ua->request($req);

$ua is a remote object and $req is a local object.  The a10s system does not
have an entry in the object table for $req yet, so the first thing is
to allocate
an object table entry for $req.  Then the remote method call is composed, 
including a request number, which is the same as the object table number for
the proxy object that gets returned and assigned to $res, followed by the method
name ('request'), followed by a notation signifying a numbered
front-half object,
possibly including initialization data.

When the request method in the back half accesses the proxy variable it is
working with as the argument to the request method call, data access
requests are
passed back to the front half, and the back half blocks for return of data from
the front half.  This blocking, in the LWP example, can be eliminated by giving
the back half more of the system:

     use asynchronous USE => 'LWP::UserAgent', MORENAMESPACES =>
'HTTP::Request';

If for some reason we want to differentiate between deferred and blocking
calls to LWP, we could designate the wrapped package to alias into a
different local namespace:

     use asynchronous LWP::UserAgent, NAMESPACE => 'Async::LWP::UserAgent';

which lets us do

     $aua = new Async::LWP::UserAgent;
     $deferred_response = $aua->get('http://www.example.com');


At import time, a mechanism similar to the Pollute module loads the 
name spaces in the front half with methods that will refer to the methods
exported by the wrapped module, and tied variables that provide deferred access
to package globals that are declared after the packages have been used, or
which are named in a MORETIES parameter. Additional methods are taken care
of with an AUTOLOAD method. 

This means that a well-behaved module that defines default values for all
externally visible package variables will have tied aliases to them in
the front-half name space.  Optional write-through caching can be turned on
and reset with cache control methods, on a system-wide, per-wrapper, or
per-object basis. When the pool size is greater than one, writes will go through
to all pools.  Tied package aliases can be suppressed with a NOPOLLUTE
parameter.

The asynchronous pragma returns results in LIFO order, but the protocol used
for the communication between the front and back will work with out-of-order
results, and in the case of a pool of backhalves order can be lost.

Generally, method calls are queued as they are invoked, and the methods are
actually run in the back half, with scalar results passed to the front half,
and object results referred to in the front half by proxy objects.

There are a variety of proxy obejcts, depending on if an object is
in the front or back half, if the object is remote or local, if  an object
is "temporary" which means that a method has been called in array
context and the back half creates a temporary entry for the object
that does not have a front-half table index associated with it yet, and
"zombie" objects which have been destroyed in the back half but which are being
kept temporarily in the object table in case another reference to a zombie
object is waiting in the unprocessed method queue.  Each kind of proxy object
has its own package which it is blessed into (see above.)

Two queue-pairs run between the halves, the main queue-pair for
the front half to pass method requests to the back half, and the auxillary
queue-pair for the back half to query the front-half about objects listed in
method arguments, when necessary, or for side effects due to methods returning
values by altering values passed by reference. Pass-by-reference of scalar
values in method arguments includes their values.  The overhead of creating
proxy objects for everything that gets passed as a method argument can
be eliminated with the NOSCALARPBR => 1 parameter.

new methods are introduced to all wrapped objects and all wrapped name
spaces, including

ready

fdset_r

fdset_w

fdset_e

fdmask_r

fdmask_w

fdmask_e

READ

WRITE

READY

set_callback

in the forking version, ready() amounts to a call to select(), followed by
the WRITE and READ methods to copy queued message objects into and
out of the sockets, and update front-half objects as appropriate depending on
information received. Authors of asynchronous applications managing a
select loop may prefer to incorporate the results of the fdnum or fdmask methods
into their select arguments, and call the WRITE and READ methods directly,
preferring the READY method, which simply reports on the readiness flag of an
asynchronous object, to the ready method, which attempts communications first.

Or they may prefer to use a10s's select loop, adding their open file descriptors
and handler routines with the add_fd method.

Even with a coder-managed select loop, the communications channels are checked
whenever a deferred method is invoked.  This behavior may be turned off when a
per-use option, "managed_select" is set to a true value.

In the threaded version, queues and shared variables are used, simplifying
the communications. (All variables are not shared, only the variables pertaining
to the message passing layer and the proxy object tables.)  The threaded version
does not have a select loop.

a "method" method in case the wrapped class has a method called
"ready" or "set_callback" etc that you want to call through to

an "array" method that is used when you want to call something in array context:

Array context calls can't be non-blocking because we need to know how many
arguments are coming back to know how many proxy-objects to make, and we don't
know that at calling time, so asynchronous method calls in array
contexts will throw a run-time exception suggesting the "array" method:

instead of

    @array = $object->method_that_returns_a_list(@args);

the user of the asynchronous pragma would have to do

   $deferred_arrayref_object->array->method_that_returns_a_list(@args);

furthermore,

   $deferred_arrayref_object->some_other_method(); #BAD

will croak immediately, but this

   $defarr = $object->array->method_that_returns_a_list(@args);
   $defarr->set_callback(sub{@array = @$defarr});

will work, as long as we don't look for results in @array until $defarr->ready()
is true.  


To make this work, the  'array' method returns a reference to the invocant
blessed into a handler class that does array-context calls.  Stacking methods
onto something that is going to eventually be a plain arrayref is nonsensical,
so that will throw an exception too.

forking fully-asynchronous mode:

In a deferred-asynchronous system, there is no danger of a callback occuring
until the front-half communication system is given a timeslice by calling the
ready() method, or at least the READ method.

In fully asynchronous mode, the callback may get invoked
at any time, so the caveats associated with working with data modified within
signal handlers would apply. The *full* system relies on the back half
signaling the front half with a SIGUSR1 (or maybe a SIGCONT, or perhaps SIGIO
is trappable) after writing, which would cause the front half to run the top
ready() method. The exact signal used would be configurable.



Out_of_order methods

Methods can be declared "out of order" meaning that they are guaranteed
to return a scalar, which we care about, and they only take scalars as
arguments,  in which case the asynchronous pragma will create a new
process or thread for each call to that method.  A maximum number of
worker threads can be defined, as well as a maximum number of methods
that will be called on a worker before it retires.  When out of order
methods are declared, the context in which the ooo methods are run is
independent.  LWP::Simple's "get" method is a good example of a method
that could be requested to be run out of order.


Custom back halves

Out-of-order, or otherwise custom
back halves may be implemented directly against the asynchronous module's front
half by declaring a BACKHALF import parameter, which will override the default
back half. The exact syntax of specifying a custom back half is undetermined
at this time, probably a coderef which will be executed immediately after
the fork or within the thread.  Methods from the standard back half will
be available in custom back halves, so that the standard (module wrapping)
back half will be easily modifiable.

return value of ref($deferred_object) and ref($proxied_object)

Front-half name spaces are merely stand-ins for the action that occurs
in the back half, so @WrappedModule::ISA is set to 
(asynchronous::object::back::remote) to allow blessing of proxy objects
into namespaces which will look right to code that inspects
ref($deferred_object.)

Likewise, the packages that front-half references are blessed into 
can be passed to the back half.  A parameter must be set to en(dis?)able these
name space conversions.


Distributed processing

No support for running the back half on a different machine than the
front half is planned in the current iteration, but trading the socketpair
connections for full stream connections to a remote machine where worker
threads will run would not be difficult to add as an enhancement.


> Project Schedule

The project will take approximately two months.  Notes for the detailed
implementation are complete and work can begin on approval. 

An asynchrnous version of test::harness will be developed, and dummy modules
exhibiting all supported wrappable behaviors will be produced to test
asynchronous against.

When asynchronous has been completed, asynchronous::threaded will be
written by replacing the communications layer with thread::queue queues.

default settings will be tuned based on user feedback for at least three
years after publication.


> Bio

David Nicol, who received a B.A. in Computer Science from University of
Missouri - Kansas City in 2001, has been writing in Perl since 1996. He
has a variety of work on CPAN, including the Pollute reimportation mechanism,
the Tie::Alias suite and the Array::Frugal data structure, which will be
incorporated into this project.  He has been frustrated by the lack of
non-blocking versions of Perl modules while writing single-threaded servers,
a frustration which, if this proposal is approved, will no longer be felt by
anyone.

> Amount Requested

$NNNN

References:

(1) http://stemsystems.com/sessions/slides/slide-0118.html

MJD's Async module provides a ready() method.  Unlike the Async module,
asynchronous wraps an entire package rather than a single backgrounded
subroutine,
and asynchonous objects promote themselves to their result values
rather than requiring
a result() method.

DJHD's Fesitval::Client::Async module provides a fh method to identify the file
handle number being used for communicating with a Festival voice
synthesis server

DDUMONT's Async::Group module deals with grouping callback functions associated
with completion of asynchronous methods, implying the concept of a
callback function
associated with completion of an asynchronous method.

the MOP module provides mappings of objects on two sides of a
communication channel

Discussion on the dbi-dev mailing list in 2004 yielded the following
requirements:
    the system needs to provide facility for back ends with multiple
file handles
    the system needs to allow all select() calls to be handled externally

Elizabeth Mattijsen's "forks" thread replacement modules demonstrate
the viability of parts of the proposed approach






-- 
David L Nicol
Communication is neither obvious nor inscrutable.


More information about the kc mailing list