[Pdx-pm] Designing a C library specifically for HLL binding

paull at peak.org paull at peak.org
Thu Oct 30 22:16:30 PDT 2008


On Thu, 2008-10-30 at 13:07 -0700, Eric Wilhelm wrote:

> I'm wondering if someone can give me some feedback on how this would 
> work and/or point me at some examples or reading material.

Hey Eric

I have written maybe half a dozen very similar modules for various EDA
formats like GDSII, Lef, Def, Edif, Verilog, and spice.  These formats
have nice grammars, so the typical approach is a flex/bison pair where
major (interesting) grammatical productions brew up a HV/AV complex data
structure and pass it to a user defined (Perl) callback.

A C library specifically for binding to a HLL interpreter.

This architecture makes it very easy to implement "filter" scripts that
grep out salient bits or create new content.  The closest analog I have
seen in CPAN modules is that of HTML::Parser.  You setup callbacks for
what you want to catch and press GO.  Mucho gusto.

The best in class of my modules parses Cadence Design Exchange Format
(Def).  Def is a textual description of a place and route database for
an integrated circuit.  The files are honking big and just complex
enough to make parsing with regexps uncomfortable.  I can't release the
full code but maybe these snippets will be useful.

-Paul

-------------- next part --------------
NAME
        Local::Def - Parser for Cadence DEF files


SYNOPSIS
        use Local::Def;

        $obj = new Local::Def;
        $obj->callback( <Production> => codeRef, ... );
        $obj->parse( "def_filename" );
        $obj->terminate;
        $was = $obj->select( [ globRef ] );
        $was = $obj->autoPrint( [ Boolean ] );
        $obj->autoFormat;
        $obj->p<Production>( args );


DESCRIPTION
       This module implements a parser for Cadence DEF files.  As the parser
       works its way through a file it recognizes various grammatical
       productions.  Each recogintion can trigger a callback to a user defined
       (Perl) sub.  In order for the callback to occur, the sub (coderef) must
       be registered with callback.

       When the parser encounters an unregistered production, the production
       is skipped.  autoPrint is a mode where these unregistered productions
       are copied to the select?ed filehandle instead of being skipped.

       The module provides print methods for each production.  The print
       methods have the same argument signature as the callback for the
       corresponding production.  The print methods all target the select?ed
       filehandle, but differ from the autoprint output in that they reformat
       the data.  Creation of DEF filters can be accomplished by intercepting,
       manipulating, and then printing productions of interest.

PRODUCTIONS
       The DEF being parsed is broken up into bite sized chunks known as
       productions.  For example, each instance in a design (like NAND2FF) is
       detailed in the "components" section.  The DEF grammar describes a
       "components" section as having three parts: a Begin production, any
       number of Body productions, and an End production.

           COMPONENTS 3 ;                        #    Begin
           - clkgen_/U124 NINVEE ;               # -+
           - clkgen_/U125 THIGH                  #  ?
             + FIXED ( 123780 633650 ) FS        #  + Body
             + WEIGHT 10 ;                       #  ?
           - clkgen_/U126 NINVEE ;               # -+
           END COMPONENTS                        #    End

       The parser considers this snippet to contain five productions.  The
       callback for BeginComponents is invoked once.  As BeginComponents only
       has one piece of data (the number of components, "3" in this case) it
       is referred to as a scalar callback.

       The Components callback is invoked three times, once for each
       component.  Note that each component brings along several pieces of
       data.  Some of these data fields are optional and will not be specified
       for every component.  What we need is a dynamic data structure to house
       this data.  A hash is a good choice.  The parser builds up a hash, with
       whatever fields are present in the production, and provides the
       callback with a reference to this hash.  The possible keys of the hash,
       and their meanings, are detailed below.  This type is referred to as a
       hashref callback.

       Finally, EndComponents gets called once.  There is no ancillary data
       with any EndComponents production, so it is referred to as a null
       callback.

       Begin-Body-End productions

       Each of the following names a production.  For each name there also
       exist Begin<Name> and End<Name> productions.  So, for "Components",
       there exist three productions: BeginComponents, Components and
       EndComponents.

           Components
           Constraints
           DefaultCap
           Groups
           IOTimings
           Nets
           PinProperties
           Pins
           PropertyDefinitions
           Regions
           ScanChains
           SpecialNets
           Vias


       Stand alone productions

       Each of these productions are single entities.  Unlike the Begin-Body-
       End productions they do not have an ordering relative to other
       productions.

           BusBitChars
           Design
           DieArea
           DividerChar
           EndDesign
           GCellGrid
           History
           NamesCaseSensitive
           Row
           Technology
           Tracks
           Units
           Version

       This parser is built for version 5.1 of the DEF grammar.  An earlier
       version of the DEF grammar included a SITE production.  This module
       supports the deprecated "SITE" production as a degenerate case of a ROW
       - a Row without a row-name.  If a SITE is encountered it will trigger
       the Row callback.

CALLBACKS
       A callback is a Perl sub (code reference) that you "register" with the
       parser by associating it with a production.  When the parser has
       completed the recognition of a production it invokes your sub, passing
       it any arguments relevant to the production.

       The parser always provides callbacks with the Local::Def object itself,
       as the first argument.  Having this object reference provides callbacks
       access to object methods.

       Here is an example that counts the number of instances for each type of
       cell in a design:

           use Local::Def;

           $obj = new Local::Def;
           $obj->callback(
               Components    => \&countModels,
               EndComponents => \&showCounts,
               );
           $obj->parse( $ARGV[0] );

           sub countModels
           {
               my( $self, $comp ) = @_;
               my( $model );

               $model = $comp->{ Model };
               ++$counts{ $model };
           }

           sub showCounts
           {
               my( $self ) = @_;
               my( $key );

               foreach $key (sort keys %counts ) {
                   printf("\t%-23s %5d\n", $key, $counts{$key} );
               }
               $self->terminate();
           }


METHODS
       Once an Local::Def object has been created (using new()) you can begin
       using any of the methods.

       Local::Def::new()

       new() creates an Local::Def object.  This object contains state
       information relevant to the parsing process such as callback
       registration, selected output file handle etc.  It does NOT include
       flex(1) input buffers or yacc(1) stack, so re-entrancy is NOT
       supported.  You may create multiple Local::Def objects in a single
       program, they will maintain separate lists of callbacks etc., but you
       cannot have the callback of one invoke the parse action of another.

       $obj->callback()

       The callback method takes any number of Production => codeRef pairs as
       arguments.  Callback registration can be changed on-the-fly (from
       within a callback) if desired.  An active callback can be disabled by
       registering undef as the codeRef.  A subroutine name (string scalar)
       can be used in lieu of a codeRef.

       $obj->parse()

       parse() requires a file name for input.  It cannot use Perl
       filehandles, or read from stdin.  parse() may be called repeatedly (on
       different files), but not recursively (from a callback).

       $obj->terminate()

       By calling $obj->terminate from within a callback, you are sending a
       message to the parser telling it to forego processing of any further
       input.  After a callback has completed, the parser checks to see if
       termination has been requested and returns from $obj->parse() if it
       has.

       If you are extracting data from just the Pins productions (for
       example), you could call $obj->terminate from within EndPins in order
       to speed up overall processing by skipping the remainder of the file:

           $obj->callback( EndPins => sub { $_[0]->terminate; } );


       $obj->autoPrint()

       Sometimes it is desired to filter a DEF file, changing only certain
       sections.  autoPrint() is provided as an efficient mechanism to print
       all portions of the file that are not otherwise being recognized for
       construction of callback arguments.  Here is a slow cat(1):

           $obj = new Local::Def;
           $obj->autoPrint(1);
           $obj->parse( $file );

       Here is a much slower one:

           $obj = new Local::Def;
           $obj->callback(
               BeginComponents  => \&Local::Def::pBeginComponents,
               BeginConstraints => \&Local::Def::pBeginConstraints,
               ...
               Tracks           => \&Local::Def::pTracks,
               Units            => \&Local::Def::pUnits,
               Version          => \&Local::Def::pVersion,
               Vias             => \&Local::Def::pVias
               );
           $obj->parse( $file );

       Note that this second version, while logically equivalent to the
       original file, may have different ordering for optional constructs, and
       different indentation and use of whitespace.

       autoPrint() without any arguments returns the current state;
       autoPrint() with an argument sets the new state and returns the old
       one.

       $obj->autoFormat()

       It turns out that the "much slower cat(1)" is quite handy.  A shorthand
       for making all the callback assignments at once is:

           $obj->autoFormat;

       This overwrites all existing callbacks, setting them to the
       corresponding print methods.  The common use is to invoke autoFormat to
       initialize all the methods, then set selective intercepts using
       callback.

       $obj->select()

       When autoPrint is enabled, or when you invoke one of the p<Production>
       methods, your output is directed to a filehandle.  The output
       filehandle is initialized to STDOUT when the object is constructed.  By
       providing select() with a filehandle globRef you redirect the autoPrint
       and p<Production> output.

           $outFile = "output.def";
           open( DEFOUT, ">$outFile" ) or die "$outFile: $!\n";
           $previous = $obj->select( \*DEFOUT );

       select() without any arguments returns a globRef to the current
       filehandle.  select() with an argument sets the new filehandle and
       returns the previous filehandle?s globRef.

       Note that to use the returned globRef as a filehandle for print you
       will need to wrap it in a block.  See the documentation on print in
       perlfunc(1) for more details.

           if ( $obj->autoPrint ) {

               print { $obj->select } "\n# Adding comments here";
           }


       $obj->p<Production>()

       In our "even slower cat(1)" example we registered "print" actions for
       every callback.  Just as every production in the Def grammar has a
       callback, every production also has a corresponding "print" method that
       takes the same argument list as the callback.

       These print methods are named by prepending a "p" to the production.
       Consider the "BeginNets" callback, for example.  When the parser
       invokes the callback associated with the BeginNets production, it
       passes two arguments: an object reference and a scalar.  The
       corresponding method, Local::Def::pBeginNets(), exists and expects an
       object reference and a scalar as arguments.

       Here is a snippet that does no parsing at all, it uses only the
       printing methods to create a Def file that will have consistent syntax.

           $obj = new Local::Def;
           $obj->pDesign( "Venom" );
           $obj->pBeginPins( scalar( @ports ) );
           foreach $pin ( @ports ) {
               $obj->pPins( { Pin => $pin, Net => $pin } );
           }
           $obj->pEndPins;


CALLBACK and pPRODUCTION ARGUMENTS
       As mentioned earlier, there are three different argument conventions
       used by the parser when invoking callbacks:

           $obj                # Null, no additional data
           $obj, $scalar       # A single piece of information
           $obj, $hashRef      # A complex data structure

       Many aspects in the DEF grammar are optional.  A Component, for
       example, may or may not have been assigned a location (placed).  If an
       optional attribute is not present then the hash passed to the callback
       will not have the corresponding key.

       DieArea : HashRef


           LLX               => Number, Required
           LLY               => Number, Required
           URX               => Number, Required
           URY               => Number, Required


       GCellGrid : HashRef


           X                 => -+
           Y                 => -+- HashRef, Optional
             {
               Origin        => Number, Required
               Num           => Number, Required
               Step          => Number, Required
             }

       "X" and "Y" keys have identical HashRef structures.

       History : scalar


       Row : HashRef


           RowName           => String, Required
           RowType           => String, Required
           OriginX           => Number, Required
           OriginY           => Number, Required
           Orient            => String, Required
           NumX              => Number, Required
           NumY              => Number, Required
           StepX             => Number, Required
           StepY             => Number, Required
           Properties        => HashRef, Optional
             {
               "prop"        => "value",
               ...
             }

       Note that if the deprecated SITE production is encountered then this
       Row callback will be triggered, but the RowName key will contain the
       string "SITE".  The pRow will output the older SITE form if the rowname
       is "SITE".

       Technology : scalar


       Tracks


           Direction         => String, Required
           Origin            => Number, Required
           Num               => Number, Required
           Step              => Number, Required
           Layer             => ArrayRef, Required
             [
               "layer",
               ...
             ]

       The Direction String will be either "X" or "Y".

       Units : scalar

       This is the UNITS DISTANCE MICRONS setting, typically 100.

       Design : scalar


       EndDesign : NULL


       BeginComponents : scalar


       EndComponents : NULL


       Components : HashRef


           Component         => String, Required
           Model             => String, Required
           Connections       => ArrayRef, Optional
             [
               String              # NetName
               ...
             ]
           Generator         => String, Optional
           Parameters        => ArrayRef, Optional
             [
               String              # Parameter
               ...
             ]
           Source            => String, Optional
           ForeignName       => String, Optional
           ForeignX          => Number, Optional
           ForeignY          => Number, Optional
           ForeignOrient     => String, Optional
           PlacementType     => String, Optional
           PlacementX        => Number, Optional
           PlacementY        => Number, Optional
           PlacementOrient   => String, Optional
           RegionName        => String, Optional
           Region            => HashRef, Optional
             {
               LLX           => Number, Required
               LLY           => Number, Required
               URX           => Number, Required
               URY           => Number, Required
             }
           Properties        => HashRef, Optional
             {
               "prop"        => "value",
               ...
             }

       The "Connections" NetNames will be in implicit port order.

       The "Parameters" are parameters to the generator.

       The "Source" String will be either "NETLIST", "DIST", "USER" or
       "TIMING".

       The "ForeignOrient" and "PlacementOrient" Strings will be one of
       "N","S", "E","W","FN","FS","FE" or "FW".

       The "PlacementType" String will be either "FIXED","COVER","PLACED" or
       "UNPLACED".

       BeginConstraints : scalar


       EndConstraints : NULL


       Constraints : HashRef


           WiredLogic        => String, Optional
           MaxDist           => Number, Optional
           RiseMin           => Number, Optional
           RiseMax           => Number, Optional
           FallMin           => Number, Optional
           FallMax           => Number, Optional
           Net               => String, Optional
           Path              => HashRef, Optional
             {
               FromComponent => String, Required
               FromPin       => String, Required
               ToComponent   => String, Required
               ToPin         => String, Required
             }
           Sum               => ArrayRef, Optional
             [
               # One or more of any of these...
               [ "Net", String ]
               [ "Path", HashRef ]
               [ "Sum", ArrayRef ]
             ]


       BeginDefaultCap : scalar


       EndDefaultCap : NULL


       DefaultCap : HashRef


           MinPins           => Number, Required
           WireCap           => Number, Required


       BeginGroups : scalar


       EndGroups : NULL


       Groups : HashRef


           Name              => String, Required
           Component         => ArrayRef, Required
             [
               String,
               ...
             ]
           Soft              => HashRef
             {
               MaxHalfPerimeter=> Number, Optional
               MaxX          => Number, Optional
               MaxY          => Number, Optional
             }
           Region            => HashRef, Optional
             {
               LLX           => Number, Required
               LLY           => Number, Required
               URX           => Number, Required
               URY           => Number, Required
             }
           RegionName        => String, Optional
           Properties        => HashRef, Optional
             {
               "prop"        => "value",
               ...
             }


       BeginIOTimings : scalar


       EndIOTimings : NULL


       IOTimings : HashRef


           Component         => String, Required
           Pin               => String, Required
           RiseVariable      => Number, Optional
           FallVariable      => Number, Optional
           RiseSlewRate      => Number, Optional
           FallSlewRate      => Number, Optional
           Capacitance       => Number, Optional


       BeginNets : scalar


       EndNets : NULL


       Nets : HashRef


           Name              => String, Required
           Connections       => ArrayRef
             [
               [ String, String, Flag ]
               ...
             ]
           Xtalk             => Number, Optional
           NonDefaultRule    => String, Optional
           Source            => String, Optional
           Original          => String, Optional
           Use               => String, Optional
           Pattern           => String, Optional
           EstCap            => String, Optional
           Weight            => String, Optional
           Fixed             => -+
           Routed            =>  +- ArrayRef, Optional
           Cover             => -+
             [                # Each Branch in the net
               [                # Each layer change in the branch
                 {
                   Layer     => String, Required
                   Taper     => Flag, Optional
                   TaperRule => String, Optional
                   Path      => ArrayRef
                     [              # Each turn on the layer
                       [ Number, Number, Number ],   # X, Y, Ext starting
                       # one or more of the following
                       [ Number, Number, Number ]    # X, Y, Ext
                       [ Number, String, Number ]    # X, "STAR", Ext
                       [ String, Number, Number ]    # "STAR", Y, Ext
                       [ String ]            # Via
                     ]
                 }
               ]
             ]
           Properties        => HashRef, Optional
             {
               "prop"        => "value",
               ...
             }

       The [ String, String, Flag ] sub-arrays contain the Component as the
       first element and Pin-name as the second element.  If a third element
       is present, no matter the contents, then "Synthesized" is implied.  The
       Component can be "PIN" or "VPIN".

       "ROUTED", "FIXED" and "COVER" keys all have similar ArrayRef
       structures.

       The third element in the path sub-arrays "Ext" is an optional wire
       extension; arrays will commonly have only two elements.  If either of
       the first two is the string "STAR" then the corresponding X or Y
       coordinate from the previous point is to be replicated - an orthogonal
       turn is implied.

       Existence of the "Taper" key implies a tapered path, regardless of the
       value.

       "TaperRule" tags a tapering rule name.

       BeginPinProperties : scalar


       EndPinProperties : NULL


       PinProperties : HashRef


           Component         => String, Required
           Pin               => String, Required
           Properties        => HashRef, Required
             {
               "prop"        => "value",
               ...
             }


       BeginPins : scalar


       EndPins : NULL


       Pins : HashRef


           Pin               => String, Required
           Net               => String, Required
           Special           => Flag, Optional
           Direction         => String, Optional
           Use               => String, Optional
           PlacementType     => String, Optional
           PlacementX        => Number, Optional
           PlacementY        => Number, Optional
           PlacementOrient, String, Optional
           Layer             => String, Optional
           LLX               => Number, Optional
           LLY               => Number, Optional
           URX               => Number, Optional
           URY               => Number, Optional

       "Direction" is one of "INPUT", "OUTPUT", "INOUT" or "FEEDTHRU".

       "PlacementOrient" will be one of "N","S", "E","W","FN","FS","FE" or
       "FW".

       "PlacementType" will be one of "FIXED", "PLACED" or "COVER".

       "Use" will be one of "SIGNAL", "POWER", "GROUND", "CLOCK" "TIEOFF" or
       "ANALOG".

       Existence of the "Special" key implies SPECIAL, regardless of value.

       BeginPropertyDefinitions : NULL


       EndPropertyDefinitions : NULL


       PropertyDefinitions : HashRef


           ObjectType        => String, Required
           PropName          => String, Required
           PropType          => String, Required
           Range             => ArrayRef, Optional
             [
               Number,     # Min
               Number      # Max
             ]
           Value             => Number or String, Optional

       "ObjectType" is one of "DESIGN", "COMPONENT", "NET", "SPECIALNET",
       "GROUP", "ROW", "PIN" or "REGION".

       "PropType" is one of "INTEGER", "REAL", "STRING" or "NAMEMAPSTRING".

       BeginRegions : scalar


       EndRegions : NULL


       Regions : HashRef


           Name              => String, Required
           Regions           => ArrayRef, Required
             [
               {
                 LLX         => Number, Required
                 LLY         => Number, Required
                 URX         => Number, Required
                 URY         => Number, Required
               }
               ...
             ]


       BeginScanChains : scalar


       EndScanChains : NULL


       ScanChains : HashRef


           Name              => String, Required
           CommonScanPins    => HashRef
             {
               In            => String, Optional
               Out           => String, Optional
             }
           Start             => HashRef, Optional
             {
               Component     => String, Required
               Out           => String, Optional
             }
           Ordered           => -+
           Floating          => -+- ArrayRef, Optional
             [
               {
                 Component   => String, Required
                 In          => String, Optional
                 Out         => String, Optional
               }
               ...
             ]
           Stop              => HashRef, Optional
             {
               Component     => String, Required
               In            => String, Optional
             }

       The "Ordered" and "Floating" keys have identical ArrayRef structures

       The "Component" in either "Start" or "Stop" can be "PIN" to imply a
       pin.

       BeginSpecialNets : scalar


       EndSpecialNets : NULL


       SpecialNets : HashRef


           Name              => String, Required
           Connections       => ArrayRef
             [
               [ String, String, Flag ]
               ...
             ]
           Width             => Number, Optional
           Voltage           => Number, Optional
           Spacing           => HashRef, Optional
             {
               "layer"       => HashRef, Required
                 {
                   Spacing   => Number, Required
                   Range     => ArrayRef, Optional
                     [
                       Number,
                       Number
                     ]
                 }
               ...
             }
           Source            => String, Optional
           Original          => String, Optional
           Use               => String, Optional
           Pattern           => String, Optional
           EstCap            => String, Optional
           Weight            => String, Optional
           Fixed             => -+
           Routed            =>  +- ArrayRef, Optional
           Cover             => -+
             [                # Each Branch in the net
               [                # Each layer change in the branch
                 {
                   Layer     => String, Required
                   Width     => Number, Required
                   Shape     => String, Optional
                   Path      => ArrayRef, Required
                     [              # Each turn on the layer
                       [ Number, Number, Number ],   # X, Y, Ext starting
                       # one or more of the following
                       [ Number, Number, Number ]    # X, Y, Ext
                       [ Number, String, Number ]    # X, "STAR", Ext
                       [ String, Number, Number ]    # "STAR", Y, Ext
                       [ String ]            # Via
                     ]
                 }
               ]
             ]
           Properties        => HashRef, Optional
             {
               "prop"        => "value",
               ...
             }

       In "Connections" the [ String, String, Flag ] sub-arrays contain
       Component and Pin names in the first two elements.  If a third element
       is present, no matter the contents, then the "Synthesized" attribute is
       implied.

       The third element in the path sub-arrays "Ext" is an optional wire
       extension; arrays will commonly have only two elements.  If either of
       the first two is the string "STAR" then the corresponding X or Y
       coordinate from the previous point is to be replicated - an orthogonal
       turn is implied.

       The "Spacing" hash is indexed by layer name.

       "ROUTED", "FIXED" and "COVER" keys all have similar ArrayRef
       structures.

       "Use" will be "SIGNAL", "POWER", "GROUND", "CLOCK" "TIEOFF" or
       "ANALOG".

       "Pattern" will be "STEINER", "BALANCED", "WIREDLOGIC" or "TRUNK".

       "Shape" will be "RING", "STRIPE", "FOLLOWPIN", "IOWIRE", "COREWIRE",
       "BLOCKWIRE", "FILLWIRE" or "BLOCKAGEWIRE".

       BeginVias : scalar


       EndVias : NULL


       Vias : HashRef


           Name              => String, Required
           Patterns          => ArrayRef, Optional
             [
               {
                 Pattern     => String, Optional
                 Rects       => ArrayRef, Optional
                   [
                     {
                       Layer => String, Optional
                       LLX   => Number, Optional
                       LLY   => Number, Optional
                       URX   => Number, Optional
                       URY   => Number, Optional
                     }
                   ]
               }
             ]


       NamesCaseSensitive : scalar


       Version : scalar


       BusBitChars : scalar


       DividerChar : scalar


SEE ALSO
       LEF/DEF Language Reference, perllol(1), perldsc(1), perlmod(1), perl(1)
-------------- next part --------------
I like to create one large typedef that contains all the callbacks and
any parser state.  This will become my Perl object.  Something like:

      typedef struct {
          /* callbacks */
          SV          *cbBeginComponents;
          SV          *cbBeginConstraints;
          SV          *cbBeginDefaultCap;
          SV          *cbBeginGroups;
          SV          *cbBeginIOTimings;
          SV          *cbBeginNets;
          SV          *cbBeginPinProperties;
          ...
          int         switchToAnyText;
          int         switchToName;
          int         switchToNotNewName;
          int         switchToPropDef;
          int         switchToValue;
          int         print;
          int         terminate;
          int         callbackOnDefMinus;
      
          SV          *self;
          SV          *gv;
          PerlIO      *ofp;
          int         lineno;
          char        hold[ 256 ];
          char        *filename;
      
      } DefObj;

new() looks something like this:

      static SV *
      new(
          SV *class
          )
      {
          DefObj      init;
          DefObj      *obj;
          size_t      len;
      
          SV          *self;
      
          init.cbBeginComponents          = &sv_undef;
          init.cbBeginConstraints         = &sv_undef;
          init.cbBeginDefaultCap          = &sv_undef;
          init.cbBeginGroups              = &sv_undef;
          init.cbBeginIOTimings           = &sv_undef;
          init.cbBeginNets                = &sv_undef;
          init.cbBeginPinProperties       = &sv_undef;
          ...
          init.switchToAnyText            = 0;  /* False */
          init.switchToName               = 0;  /* False */
          init.switchToNotNewName         = 0;  /* False */
          init.switchToPropDef            = 0;  /* False */
          init.switchToValue              = 0;  /* False */
          init.print                      = 0;  /* False */
          init.terminate                  = 0;  /* False */
          init.callbackOnDefMinus         = 0;  /* False */
      
      
          init.ofp                        = PerlIO_stdout();
          init.lineno                     = 1;
          init.filename                   = (char*) 0;
          init.hold[0]                    = '\0';
      
          init.gv = newRV( *hv_fetch( gv_stashpv( "main", 0 ),
                                      "STDOUT", 6, 0 ));
      
          self = newRV( newSVpv( (char *)&init, sizeof( DefObj )));
          obj  = (DefObj*) SvPV( SvRV( self ), len );
          obj->self = self;
      
          return sv_bless( self, gv_stashsv( class, 0 ));
      }

What I did here was to initialize a C auto variable (init) and then
make a Perl string (newSVpv) to hold a copy of it.  This only works if
Perl aligns SVpv storage to worst-case boundaries (double on 8 byte
for my platforms at work), which it appears to do.  I bless this string
and it becomes my object.

I generally spend a lot of effort in the scanner to skip over large
chunks, if there is no callback that needs to be fed.

Most of the XS is dirt simple because I only pass Perl types back and forth.

      void
      parse(self,fname)
              SV              *self
              SV              *fname
              PROTOTYPE:      $$
      
      void
      autoFormat(self)
              SV              *self
              PROTOTYPE:      $
      
      SV *
      new(class)
              SV              *class
              PROTOTYPE:      $

      void
      callback(self, ...)
              SV              *self
              PREINIT:
                  DefObj      *dg;
                  size_t      i;
                  size_t      len;
              CODE:
                  if ( 1 != ( items % 2 )) {  /* 1 for self */
                      croak( "callback(): Odd number of arguments" ); }
                  dg = (DefObj*)SvPV( SvRV( self ), len );
                  for( i = 1;   i < items;   i += 2 ) {
                      insertCallback( dg, ST( i ), ST( i + 1 ));
                  }
              PROTOTYPE:      $@
      

Manufacturing all the HV/AV stuff though, is lotsa work.  I found that
once I settled on a set of conventions for how I passed complex data
back to the user I could encapsulate most of the work into a mini language
built up of macros.  Here is a typical production from the parser:

      propDefs                        /* %type <void> */
          :
              /* Empty */
          |
              propDefs                /*  $1 */
              objectType              /*  $2 */
              { BeginNAME; }
              DEF_NAME                /*  $4 */
              propType                /*  $5 */
              range                   /*  $6 */
              propValue               /*  $7 */
              DEF_SEMICOLON           /*  $8 */
              {
                  DeclareArgs;
      
                  TagStore( args, TAG_OBJECTTYPE, $2 );
                  TagStore( args, TAG_PROPNAME,   $4 );
                  TagStore( args, TAG_PROPTYPE,   $5 );
      
                  ErtStore( args, TAG_RANGE, $6 );
      
                  if ( $7 ) { TagStore( args, TAG_VALUE, $7 ); }
      
                  CallBack1Arg( PropertyDefinitions, argsRef );
      
                  BeginPROPDEF;
              }
          ;
      
      
The helper macros look like this:

      #define DeclareArgs                                                     \
              HV      *args     = newHV();                                    \
              SV      *argsRef  = newRV( (SV*)args );                         \
              SvREFCNT_dec( (SV*)args )
      
      #define TagStore(h,t,v)                                                 \
              hv_store( h, TagName(t), TagLen(t), v, 0 )
      
      #define CallBack1Arg(CB,arg1)                                           \
              if ( &sv_undef != defGlobal->cb##CB ) {                         \
                  dSP;                                                        \
                  PUSHMARK(sp);                                               \
                  XPUSHs( defGlobal->self );                                  \
                  XPUSHs( arg1 );                                             \
                  PUTBACK;                                                    \
                  perl_call_sv( defGlobal->cb##CB, G_DISCARD | G_SCALAR );    \
              }                                                               \
              SvREFCNT_dec( arg1 );                                           \
              if ( defGlobal->terminate ) return 0




More information about the Pdx-pm-list mailing list