[VPM] new info for Nov meet - written module description

Tue Nov 16 01:01:46 CST 2004

Below is a rewritten DESCRIPTION for my "Rosetta" module, which will 
be in the next release that goes up after the meeting.  It should be 
much better than the one currently on CPAN, and may make the module a 
lot easier to understand.

Note that since I am coming down to the wire and still don't have 
working demo code yet, I may have to sacrifice printed hand-outs, if 
there ever were going to be any.  The following may take the place of 
one.

-- Darren Duncan

---------------

The Rosetta Perl 5 module defines a complete and rigorous API for 
database access that provides hassle-free portability between many 
dozens of database products for database-using applications of any 
size and complexity, that leverage all sorts of advanced database 
product features.  The Rosetta Native Interface (RNI) allows you to 
create specifications for any type of database task or activity (eg: 
queries, DML, DDL, connection management) that look like ordinary 
routines (procedures or functions) to your programs, and execute them 
as such; all routine arguments are named.

Rosetta is trivially easy to install, since it is written in pure 
Perl and its whole dependency chain consists of just 2 other pure 
Perl modules.

One of the main goals of Rosetta is similar to that of the Java 
platform, namely "write once, run anywhere".  Code written against 
the RNI will run in an identical fashion with zero changes regardless 
of what underlying database product is in use.  Rosetta is intended 
to help free users and developers from database vendor lock-in, such 
as that caused by the investment in large quantities of 
vendor-specific code.  It also comes with a comprehensive validation 
suite that proves it is providing identical behaviour no matter what 
the underlying database vendor is.

The RNI is structured in a loosely similar fashion to the DBI 
module's API, and it should be possible to adapt applications written 
to use the DBI or one of its many wrapper modules without too much 
trouble, if not directly then by way of an emulation layer.  One 
aspect of this similarity is the hierarchy of interface objects; you 
start with a root, which spawns objects that represent database 
connections, each of which spawns objects representing queries or 
statements run against a database through said connections.  Another 
similarity, which is more specific to DBI itself, is that the API 
definition is uncoupled from any particular implementation, such that 
many specialized implementations can exist and be distributed 
separately.  Also, a multiplicity of implementations can be used in 
parallel by the same application through a common interface.  Where 
DBI gives the name 'driver' to each implementation, Rosetta gives the 
name 'Engine', which may be more descriptive as they sit "beneath" 
the interface; in some cases, an Engine can even be fully 
self-contained, rather than mediating with an external database. 
Another similarity is that the preparation and execution (with 
place-holder substitution) of database instructions are distinct 
activities, and you can reuse a prepared instruction for multiple 
executions to get performance gains.

The Rosetta module does not talk to or implement any databases by 
itself; it is up to separately distributed Engine modules to do this. 
You can see a reference implementation of one in the 
Rosetta::Engine::Generic module.

The main difference between Rosetta and the DBI is that Rosetta takes 
its input primarily as SQL::Routine (SRT) objects, where DBI takes 
SQL strings.  See the documentation for SQL::Routine (distributed 
separately) for details on how to define those objects.  Also, when 
Rosetta dumps a scanned database schema, it does so as SRT objects, 
while DBI dumps as either SQL strings or simple Perl arrays, 
depending on the schema object type.  Each 'routine' that Rosetta 
takes as input is equivalent to one or more SQL statements, where 
later statements can use the results of earlier ones as their input. 
The named argument list of a 'routine' is analagous to the bind var 
list of DBI; each one defines what values can be given to the 
statements at "execute" time.

Unlike SQL strings, SRT objects have very little redundancy, and the 
parts are linked by references rather than by name; the spelling of 
each SQL identifier (such as a table or column name) is stored 
exactly once; if you change the single copy, then all code that 
refers to the entity updates at once.  SRT objects can also store 
meta-data that SQL strings can't accomodate, and you define database 
actions with the objects in exactly the same way regardless of the 
database product in use; you do not write slightly different versions 
for each as you do with SQL strings.  Developers don't have to 
restrict their conceptual processes into the limits or dialect of a 
single product, or spend time worrying about how to express the same 
idea against different products.

Rosetta is especially suited for data-driven applications, since the 
composite scalar values in their data dictionaries can often be 
copied directly to RNI structures, saving applications the tedious 
work of generating SQL themselves.

Rosetta is conceptually a DBI wrapper, whose strongest addition is 
SQL generation, but it also works without the DBI, and with non-SQL 
databases; it is up to each Engine to use or not use DBI, though most 
will use it because the DBI is a high quality and mature platform to 
build upon.

The choice between using DBI and using Rosetta seems to be analagous 
to the choice between the C and Java programming languages, 
respectively, where each database product is analagous to a hardware 
CPU architecture or wider hardware platform.  The DBI is great for 
people who like working as close to the metal as possible, with 
direct access to each database product's native way of doing things, 
those who *want* to talk to their database in its native SQL dialect, 
and those who want the absolute highest performance.  Rosetta is more 
high level, for those who want the write-once run-anywhere 
experience, less of a burden on their creativity, more development 
time saving features, and are willing to sacrifice a modicum of 
performance for the privilege.

There exist on CPAN many dozens of other modules or frameworks whose 
modus operandi is to wrap the DBI or be used together with it for 
various reasons, such as to provide automated object persistence 
functionality, or a cross-database portability solution, or to 
provide part of a wider scoped application tool kit, or to generate 
SQL, or to clone databases, or generate reports, or provide a web 
interface, or to provide a "simpler" or "easier to use" interface. 
So, outside the DBI question, a choice exists between using Rosetta 
and one of these other CPAN modules.  Going into detail on that 
matter is outside the scope of this documentation, but a few salient 
points are offered.  For one thing, Rosetta allows you to do a lot 
more than the alternatives in an elegant fashion; with other modules, 
you would often have to inject fragments of raw SQL into their 
objects (such as "select" query conditionals) to accomplish what you 
want; with Rosetta, you should never need to do any SQL injection. 
For another point, Rosetta has a strong emphasis on portability 
between many database products; only a handful of other modules 
support more than 2-3 database products, and many only claim to 
support one (usually MySQL).  Also, more than half of the other 
modules look like they had only 5-20 hours of effort at most put into 
them, while Rosetta and its related modules have likely had over 1000 
hours of full time effort put into them.  For another point, there is 
a frequent lack of support for commonly desired database features in 
other modules, such as multiple column keys.  Also, most modules have 
a common structural deficiency such that they are designed to support 
a very specific set of database concepts, and adding more is a lot of 
work; by contrast, Rosetta is internally designed in a heavily 
data-driven fashion, allowing the addition or alternation of many 
features with little cost in effort or complexity.

Perhaps a number of other CPAN modules' authors will see value in 
adding back-end support for Rosetta and/or SQL::Routine to their 
offerings, either as a supplement to their DBI-using native database 
SQL back-ends, or as a single replacement for the lot of them. 
Particularly in the latter case, the authors will be more freed up to 
focus on their added value, such as object persistence or web 
interfaces, rather than worrying about portability issues.  As quid 
quo pro, perhaps some of the other CPAN modules (or parts of them) 
can be used by a Rosetta Engine to help it do its work.

Please see the Rosetta::Framework documentation file for more 
information on the Rosetta framework at large.  It shows this current 
module in the context of actual or possible other components.