[sf-perl] XML <-> SQL with Chris Mungall
rdm at cfcl.com
Mon Jun 27 21:06:58 PDT 2005
Transliterating from SQL into XML (and back again) is useful, if only because
it allows SQL databases to be serialized, saved, reloaded, etc. Translating
SQL into idiomatic XML (or vice versa) is a much harder nut to
crack, but it's
also more interesting, at least to me.
My intuition is that many data structure formats (e.g., GXL, RDBMS
XML, YAML) are "equivalent" in the manner that assorted computer
(i.e., "Turing equivalent"). That said, transforming from idiomatic RDBMS to
idiomatic XML is going to be about as easy as transforming from
to idiomatic Fortran. Nonetheless, that is the problem we increasingly face.
I've imported SQL, XML, and assorted other formats into a Perl/YAML-based model
(i.e., hashes, lists, and references). My take is that each input format has
its own "idioms", including:
* XML is structured as a "list of lists", where each node may have an
attached hash. More specifically, any given node type may have a
number of defining characteristics, including:
+ list arity limits (e.g., min, max)
Common list arities include
N == 0 (empty node)
N >= 0 (optional list)
N >= 1 (non-empty list)
+ assorted list constraints
Is the order significant, or is this just a set? Will there be raw
text mixed in with tagged nodes, at the same level? Will there be
significant intervening white space between tagged nodes? Etc.
+ hash limits
Certain keys may be required.
* RDBMS tables can be modeled as sets of hashes:
(A, B, C): (I, A), (I, B), (I, C)
In fact, there's a DBMS that exploits this fact. That said,
the SELECT statement can do things with tables that require
quite a bit of Perl to simulate (e.g., compound keys, joins).
I strongly suspect that there are SQL "idioms" that use tables
in characteristic ways. In fact, I've asked in the past for
suggestions, but I'm still waiting for responses.
I suspect that it should be possible to do mechanized conversions of data
structures in one of two ways:
* Compile down to some "atomic" form (e.g., RDF tuples, as used in TAP
- tap.stanford.edu), then "decompile" back into a "higher-order" form.
* Encode the data structure as a set of idioms, then perform idiom-to-
idiom transformations to the target representation.
Of course, I don't know how to DO any of this, but it doesn't hurt (much)
to think about the problem. Anyway, these are some topics I may try to
bring up after Chris has made his presentation, etc.
email: rdm at cfcl.com; phone: +1 650-873-7841
http://www.cfcl.com - Canta Forda Computer Laboratory
http://www.cfcl.com/Meta - The FreeBSD Browser, Meta Project, etc.
More information about the SanFrancisco-pm