[VPM] new info for Nov meet - written module description
Darren Duncan
darren at DarrenDuncan.net
Tue Nov 16 01:01:46 CST 2004
Below is a rewritten DESCRIPTION for my "Rosetta" module, which will
be in the next release that goes up after the meeting. It should be
much better than the one currently on CPAN, and may make the module a
lot easier to understand.
Note that since I am coming down to the wire and still don't have
working demo code yet, I may have to sacrifice printed hand-outs, if
there ever were going to be any. The following may take the place of
one.
-- Darren Duncan
---------------
The Rosetta Perl 5 module defines a complete and rigorous API for
database access that provides hassle-free portability between many
dozens of database products for database-using applications of any
size and complexity, that leverage all sorts of advanced database
product features. The Rosetta Native Interface (RNI) allows you to
create specifications for any type of database task or activity (eg:
queries, DML, DDL, connection management) that look like ordinary
routines (procedures or functions) to your programs, and execute them
as such; all routine arguments are named.
Rosetta is trivially easy to install, since it is written in pure
Perl and its whole dependency chain consists of just 2 other pure
Perl modules.
One of the main goals of Rosetta is similar to that of the Java
platform, namely "write once, run anywhere". Code written against
the RNI will run in an identical fashion with zero changes regardless
of what underlying database product is in use. Rosetta is intended
to help free users and developers from database vendor lock-in, such
as that caused by the investment in large quantities of
vendor-specific code. It also comes with a comprehensive validation
suite that proves it is providing identical behaviour no matter what
the underlying database vendor is.
The RNI is structured in a loosely similar fashion to the DBI
module's API, and it should be possible to adapt applications written
to use the DBI or one of its many wrapper modules without too much
trouble, if not directly then by way of an emulation layer. One
aspect of this similarity is the hierarchy of interface objects; you
start with a root, which spawns objects that represent database
connections, each of which spawns objects representing queries or
statements run against a database through said connections. Another
similarity, which is more specific to DBI itself, is that the API
definition is uncoupled from any particular implementation, such that
many specialized implementations can exist and be distributed
separately. Also, a multiplicity of implementations can be used in
parallel by the same application through a common interface. Where
DBI gives the name 'driver' to each implementation, Rosetta gives the
name 'Engine', which may be more descriptive as they sit "beneath"
the interface; in some cases, an Engine can even be fully
self-contained, rather than mediating with an external database.
Another similarity is that the preparation and execution (with
place-holder substitution) of database instructions are distinct
activities, and you can reuse a prepared instruction for multiple
executions to get performance gains.
The Rosetta module does not talk to or implement any databases by
itself; it is up to separately distributed Engine modules to do this.
You can see a reference implementation of one in the
Rosetta::Engine::Generic module.
The main difference between Rosetta and the DBI is that Rosetta takes
its input primarily as SQL::Routine (SRT) objects, where DBI takes
SQL strings. See the documentation for SQL::Routine (distributed
separately) for details on how to define those objects. Also, when
Rosetta dumps a scanned database schema, it does so as SRT objects,
while DBI dumps as either SQL strings or simple Perl arrays,
depending on the schema object type. Each 'routine' that Rosetta
takes as input is equivalent to one or more SQL statements, where
later statements can use the results of earlier ones as their input.
The named argument list of a 'routine' is analagous to the bind var
list of DBI; each one defines what values can be given to the
statements at "execute" time.
Unlike SQL strings, SRT objects have very little redundancy, and the
parts are linked by references rather than by name; the spelling of
each SQL identifier (such as a table or column name) is stored
exactly once; if you change the single copy, then all code that
refers to the entity updates at once. SRT objects can also store
meta-data that SQL strings can't accomodate, and you define database
actions with the objects in exactly the same way regardless of the
database product in use; you do not write slightly different versions
for each as you do with SQL strings. Developers don't have to
restrict their conceptual processes into the limits or dialect of a
single product, or spend time worrying about how to express the same
idea against different products.
Rosetta is especially suited for data-driven applications, since the
composite scalar values in their data dictionaries can often be
copied directly to RNI structures, saving applications the tedious
work of generating SQL themselves.
Rosetta is conceptually a DBI wrapper, whose strongest addition is
SQL generation, but it also works without the DBI, and with non-SQL
databases; it is up to each Engine to use or not use DBI, though most
will use it because the DBI is a high quality and mature platform to
build upon.
The choice between using DBI and using Rosetta seems to be analagous
to the choice between the C and Java programming languages,
respectively, where each database product is analagous to a hardware
CPU architecture or wider hardware platform. The DBI is great for
people who like working as close to the metal as possible, with
direct access to each database product's native way of doing things,
those who *want* to talk to their database in its native SQL dialect,
and those who want the absolute highest performance. Rosetta is more
high level, for those who want the write-once run-anywhere
experience, less of a burden on their creativity, more development
time saving features, and are willing to sacrifice a modicum of
performance for the privilege.
There exist on CPAN many dozens of other modules or frameworks whose
modus operandi is to wrap the DBI or be used together with it for
various reasons, such as to provide automated object persistence
functionality, or a cross-database portability solution, or to
provide part of a wider scoped application tool kit, or to generate
SQL, or to clone databases, or generate reports, or provide a web
interface, or to provide a "simpler" or "easier to use" interface.
So, outside the DBI question, a choice exists between using Rosetta
and one of these other CPAN modules. Going into detail on that
matter is outside the scope of this documentation, but a few salient
points are offered. For one thing, Rosetta allows you to do a lot
more than the alternatives in an elegant fashion; with other modules,
you would often have to inject fragments of raw SQL into their
objects (such as "select" query conditionals) to accomplish what you
want; with Rosetta, you should never need to do any SQL injection.
For another point, Rosetta has a strong emphasis on portability
between many database products; only a handful of other modules
support more than 2-3 database products, and many only claim to
support one (usually MySQL). Also, more than half of the other
modules look like they had only 5-20 hours of effort at most put into
them, while Rosetta and its related modules have likely had over 1000
hours of full time effort put into them. For another point, there is
a frequent lack of support for commonly desired database features in
other modules, such as multiple column keys. Also, most modules have
a common structural deficiency such that they are designed to support
a very specific set of database concepts, and adding more is a lot of
work; by contrast, Rosetta is internally designed in a heavily
data-driven fashion, allowing the addition or alternation of many
features with little cost in effort or complexity.
Perhaps a number of other CPAN modules' authors will see value in
adding back-end support for Rosetta and/or SQL::Routine to their
offerings, either as a supplement to their DBI-using native database
SQL back-ends, or as a single replacement for the lot of them.
Particularly in the latter case, the authors will be more freed up to
focus on their added value, such as object persistence or web
interfaces, rather than worrying about portability issues. As quid
quo pro, perhaps some of the other CPAN modules (or parts of them)
can be used by a Rosetta Engine to help it do its work.
Please see the Rosetta::Framework documentation file for more
information on the Rosetta framework at large. It shows this current
module in the context of actual or possible other components.
More information about the Victoria-pm
mailing list