[Dahut-pm] [RFC] XML Compiler Compiler

James G Smith JGSmith at TAMU.Edu
Mon Feb 21 09:11:20 PST 2005


Due to issues at work (new manager :/), I haven't been able to work
on the Gestinanna framework since OSCon last year.  But things have
settled down a bit and I'm wanting to get back to it some.

The core piece that seemed most interesting to others was the XML
compiler that could generate Perl code given a description of an
application.  As with XSP, taglibs extended the language and provided
the glue between the controller and the model.

Part of the problem was that the taglibs were hand-coded as part of a
SAX framework.  This is nice for those of us that understand such
things, but for the average programmer that just wants to create an
extension, it might be a bit daunting.  If nothing else, there's a
fairly steep learning curve (XSP solved a lot of this with their base
taglib package that works well for that environment).

Since taglibs extend the language (or can provide the language) and
they are XML, I figured I should be able to describe the XML schema
that the taglib defines, what the tags can be compiled to, and let an
XML Compiler Compiler generate the actual compiler for the taglib.

We're familiar with Yacc and its relatives and the BNF they consume.
The equivalent in the XML world seems to be RelaxNG with a few
modifications.

One aspect of XML that is different than languages Yacc works with is
that XML elements can be in one of at least two contexts: block and
expression.  An expression results in a value that is used by the
parent element.  A block either does not or is not expected to
produce a value.  This distinction isn't important when writing XML,
but only when compiling into a language that makes a distinction
between the two in its grammer.

For example, if we have the fragment 

     <for-each select="/people/*"> ... </for-each>
     
with some children, then in a block context, it would be translated to

     foreach my $x ( @people ) { ... }

but in an expression context, it would be translated to

     map { ... } @people

since the result of a map is a value that can be used again.

This complicates the compiler definition since it must keep track of
which context a given element is in -- and that context is dependent
on the element's ancestors.

There are other dependencies on the context as well.  For example, in
Perl, the statement separater is a semi-colon in a block context, but
a comma in an expression context.

This dependence on context makes writing taglibs a bit more
complicated than you might expect based on XSP.

Because of this dependence, I am thinking about making the
compilation instructions a bit more abstrace than verbatim Perl.
This has two benefits: the taglib writer doesn't have to worry about
context, and the compiler compiler can write compilers targeting
different languages if there is no verbatim code being emitted by the
compiler (in the compiler definition -- not the compiled compiler).

As an example:

<grammer datatypeLibrary="http://...">

  <define name="Variable">
    <element name="variable">
      <zeroOrMore cc:id="content">
        <ref name="Elements" />
      </zeroOrMore>
      <attribute name="name">
        <data type="path"/>
      </attribute>
      <cc:compile context="expression">
        <cc:result>
          <cc:assign lhs="attribute::name" rhs="content"/>
        </cc:result>
      </cc:compile>
    </element>
  </define>

</grammer>

In this fragment, we define an element <variable/> that takes an
attribute `name' defining a variable.  The element creates an
expression context for any enclosing elements since it is assigning a
value to a variable.

I'm still not sure if or how I would want the <cc:assign/> element.
The compiler compiler would create a compiler that could work with
whatever context was imposed on the <variable/> element -- assigning
in an expression context or not.

The ultimate goal is creating a description of the compiler compiler
taglib and the RelaxNG schema and using those to generate the
compiler compiler =)
--
James Smith <JGSmith at TAMU.Edu>, 979-862-3725
Texas A&M CIS Operating Systems Group, Unix


More information about the Dahut-pm mailing list