[sf-perl] XML <-> SQL with Chris Mungall

Mon Jun 27 21:06:58 PDT 2005

Executive Summary

   Transliterating from SQL into XML (and back again) is useful, if only because
   it allows SQL databases to be serialized, saved, reloaded, etc.  Translating
   SQL into idiomatic XML (or vice versa) is a much harder nut to 
crack, but it's
   also more interesting, at least to me.

   My intuition is that many data structure formats (e.g., GXL, RDBMS 
tables, RDF,
   XML, YAML) are "equivalent" in the manner that assorted computer 
languages are
   (i.e., "Turing equivalent").  That said, transforming from idiomatic RDBMS to
   idiomatic XML is going to be about as easy as transforming from 
idiomatic Lisp
   to idiomatic Fortran.  Nonetheless, that is the problem we increasingly face.

Background

I've imported SQL, XML, and assorted other formats into a Perl/YAML-based model
(i.e., hashes, lists, and references).  My take is that each input format has
its own "idioms", including:

   *  XML is structured as a "list of lists", where each node may have an
      attached hash.  More specifically, any given node type may have a
      number of defining characteristics, including:

      +  list arity limits (e.g., min, max)

         Common list arities include

           N == 0 (empty node)
           N >= 0 (optional list)
           N >= 1 (non-empty list)

      +  assorted list constraints

         Is the order significant, or is this just a set?  Will there be raw
         text mixed in with tagged nodes, at the same level?  Will there be
         significant intervening white space between tagged nodes?  Etc.

      +  hash limits

         Certain keys may be required.

  *  RDBMS tables can be modeled as sets of hashes:

        (A, B, C):  (I, A), (I, B), (I, C)

     In fact, there's a DBMS that exploits this fact.  That said,
     the SELECT statement can do things with tables that require
     quite a bit of Perl to simulate (e.g., compound keys, joins).

     I strongly suspect that there are SQL "idioms" that use tables
     in characteristic ways.  In fact, I've asked in the past for
     suggestions, but I'm still waiting for responses.

I suspect that it should be possible to do mechanized conversions of data
structures in one of two ways:

  *  Compile down to some "atomic" form (e.g., RDF tuples, as used in TAP
     - tap.stanford.edu), then "decompile" back into a "higher-order" form.

  *  Encode the data structure as a set of idioms, then perform idiom-to-
     idiom transformations to the target representation.

Of course, I don't know how to DO any of this, but it doesn't hurt (much)
to think about the problem.  Anyway, these are some topics I may try to
bring up after Chris has made his presentation, etc.

-r
-- 
email: rdm at cfcl.com; phone: +1 650-873-7841
http://www.cfcl.com        - Canta Forda Computer Laboratory
http://www.cfcl.com/Meta   - The FreeBSD Browser, Meta Project, etc.