[VPM] ANNOUNCE: first post-rewrite Rosetta release (v0.720.0)

Thu Feb 2 20:08:10 PST 2006

2006-02-01   Darren Duncan <perl at DarrenDuncan.net>
--------------------------------------------------

I am pleased to announce the first CPAN release of the second major 
code base (started on 2005-10) of the Rosetta database access 
framework, v0.720.0, which is available now in synchronized native 
Perl 5 and Perl 6 versions.

This is a complete rewrite, including very different detail designs, 
implementations, and documentations, though it still retains the same 
high level design and purpose.

------------

The Perl 5 version is composed of these 2 distributions (more come later):

  * Rosetta-v0.720.0.tar.gz
  * Rosetta-Engine-Native-v0.1.0.tar.gz

These have Locale-KeyedText-v1.72.1.tar.gz (released at the same 
time) as an external dependency.

The Perl 6 versions of all 3 of the above items are bundled with 
Perl6-Pugs-6.2.11.tar.gz (released a half-day earlier) in its ext/ 
subdirectory.

The Perl 6 versions don't depend on anything outside the Perl6-Pugs 
distro that they live in.  But the Perl 5 versions also have external 
dependencies on Perl 5.8.1+ and these Perl 5 packages, which add 
features that Perl 6 and Pugs already have built-in: 'version', 
'only', 'Readonly', Class::Std, Class::Std::Utils, Scalar::Util, 
Test::More; the latter 2 are bundled with Perl 5.

------------

Following is both a reintroduction to the remade Rosetta as it is and 
will soon be, and a summary of the main changes from before the 
rewrite (first major code base of 2002 thru 2005-09).

For various reasons such will be bared below, it should be more 
apparent than ever that Rosetta is "not just another DBI wrapper" and 
really stands out as something different than any existing tools on 
CPAN.

Note that many of these details aren't yet in Rosetta's own 
documentation (they will be later), so they are distinct to this 
email.

* Locale::KeyedText is officially not part of the Rosetta framework 
anymore, being a distinct external dependency instead of its 
localization component.

* Anything that was in the SQL::Routine name space has been renamed 
into the 'Rosetta' name space.

* Briefly comparing DBI to Rosetta, DBI provides users with database 
driver independence; Rosetta provides them with database language 
independence, which is a higher abstraction, but it should still work 
quickly.

* Rosetta is now officially a federated relational database of its 
own that just happens to be good with cross-database-manager 
portability issues, and be good as a toolkit on which to build ORMs 
and persistence tools, rather than being mainly about portable SQL 
generation.

* The native query and schema design language of Rosetta is now based 
mainly on Tutorial D (by Christopher J. Date and Hugh Darwen) and 
closely resembles relational algrebra, rather than being based on SQL 
as it was before (note that some current documentation suggests 
otherwise, but that will be rewritten).

* Note, see http://www.oreilly.com/catalog/databaseid/ , the book by 
Date named "Database in Depth", which is one of the best references 
on database design I have ever seen.  Everyone who works with 
databases should read it.  Its not dry and has practical stuff you 
can apply right now.  I am.

* The native language of Rosetta is presently called "Intermediate 
Relational Language" ("IRL", pronounced "earl", or "girl" without the 
"g"); it is inspired by Pugs' "PIL", which serves a similar purpose 
for Perl 6 as what IRL does for Tutorial D and SQL and other 
languages.

* IRL is strongly typed, where every value and container is of a 
single type, and permits user data type definitions to be arbitrarily 
complex (such as temporal and spacial data) but non-recursive.  Aside 
from forbidding "references", it includes the features of so-called 
"object-relational" databases which are actually part of the true 
plain "relational" data model.  Values of each distinct data type can 
not be substituted as operator arguments for others, or stored in 
containers for others, but they can be explicitly cross-converted in 
some circumstances (eg num to str or str to num).

* Despite actually being strongly typed, IRL has facilities to 
simulate weak data types over strong ones; for example, you can 
define an SV type that has numerical and character string components. 
More broadly speaking, you can define multi-part "disjunctive" types, 
each of a different other type, where only one member has a 
significant value at once, and the others have their type's concept 
of an "empty" value; actually, these have a single extra member that 
says which of the others holds the significant value.

* IRL natively uses 2-valued-logic (2VL) like Tutorial D, and not 
3-valued-logic (3VL) like SQL, so every boolean valued expression 
always evaluates to true or false, not true or false or unknown (a 
SQL NULL).  But it does simulate 3-valued-logic using disjunctive 
data types, one of whose members is the system defined "Unknown" 
strong data type, which can only ever hold the same single value; by 
definition, a disjunctive data type value whose member A is the 
significant one will never match with another whose significant 
member is B, and hence we can distinguish between "Unknown" and zero 
or the empty string when a number or string can't actually be set to 
Unknown (null).

* IRL has distinct data types for what are commonly referred to as 
"relations" (like a SQL table with a key, which may be over all of 
its columns) and "bags" (like a SQL table that lacks a key), where 
the former forbids duplicates and the latter allows them.  Given 
Rosetta's hard typing, a relation and a bag can not be substituted 
for each other (except that they can be cross-converted, as numbers 
and character strings can be cross-converted), but rather have their 
own operators which either never output or can output duplicates 
respectively.  A bag can be implemented over a relation where the 
relation has one extra attribute which stores a count of occurances 
for the otherwise distinct combination of other attributes, and 
operators do the right thing with that count.

* There is no inherent order of the attributes/columns of 
relations/bags/tables, and there is no inherent order to their 
tuples/rows, unlike SQL where at least the order of columns is 
significant.  IRL does all references by names rather than by 
position; all operator parameters are named, as are relation 
attributes.

* Besides relations and bags, IRL has a distinct array data type, 
which is what you get when using an order-by; usually it only makes 
sense to use this as the last step in a query when fetching data, if 
the order is important.

* All typical joins between relations/bags/tables are natural joins, 
where attributes/columns of each joined item implicitly correspond 
and match when they have the same names and data types (and if none 
match, you have a cartesian).  You never specify join conditions 
explicitly by using "foo = bar" or any such thing; rather, if you 
want to match on dis-similar names, you first rename (like SQL's 
"AS") one or both source columns.  This also means that you can join 
an arbitrary number of relations/tables in a single operation, and 
they will just work, with the combined output relation/table having 
distinct attribute/column names already.

* Instead of saying "select <attr-list> from <relation> where 
<condition>", you nest arbitrary relational algebra expressions like 
"project( restrict( <relation>, <condition> ), <attr-list> )" or 
"restrict( project( <relation>, <attr-list> ), <condition> )"; both 
of those latter 2 happen to give the exact same result.

* The finer grained IRL should be easier to write non-trivial queries 
in than SQL, especially when adding things like groups and havings 
and such, since you can more reliably know what pieces you have to 
work with, and exactly what will happen when you say certain things, 
and you don't have to needlessly duplicate expressions.  Writing 
queries in IRL should be more reliable than SQL since you don't have 
to worry about getting different results from 2 logically identical 
queries and you don't have to deal with ambiguous syntax.

* IRL should also be a lot easier to optimize for speed given the 
lack of ambiguity that plagues attempts to optimize SQL.

* Rosetta is designed to be very componentized, where you can 
substitute back-ends and front-ends at will, so it can work over both 
SQL based and non-SQL based database engines, and its user interface 
can resemble anything you want.  It is also reasonably easy to map 
SQL to IRL and back, so you can still query Rosetta databases using 
various SQL dialects or other languages if you don't want to see the 
IRL, and this can help with migrating older applications.

* It is likely to be the ideal case for most Rosetta users to have an 
alternate front end, such as some adapted from current DBI wrappers, 
object persistence or relational mapping tools, and so on, rather 
than using IRL directly.  Using Rosetta rather than DBI should make 
the tasks of people making such wrappers and tools easier, since they 
have a more reliable language to work against and they don't have to 
maintain a multiplicity of back ends for each storage engine; Rosetta 
does the latter for them.

* A typical Rosetta back-end that operates over an existing database 
engine will take care of optimizing the queries for the native 
database so they perform best.  When using Rosetta, you just say 
*what* you want to happen, not so much how, and Rosetta will take 
care of getting it done quickly and correctly.

* A self contained back-end named Rosetta::Engine::Native implements 
a relational database in Perl, so you can have that functionality 
without straying outside Perl if you want.  Of course, 
Rosetta::Engine::Native is only meant to be a correct example, not 
fast, so it should only be used for testing.  Other backends can be 
used for production.

* Genezzo is an already existing fast third party database, 
implemented in Perl, which will be adapted to use Rosetta as its 
interface, so you do have, a Perl option besides the for-testing-only 
Native.

* The license of Rosetta has changed, such that my GPL exception 
granted to allow linked code to retain its own license has changed; 
it is no longer based on technicalities like how the linking is done, 
but rather on what kind of license the linked code has.  This should 
make things a lot easier for developers of all stripes.

* See the Changes file with 'Rosetta' for more details on some aspects.

------------

Note that the current Rosetta framework on CPAN is mostly 
documentation (incomplete and partly out of date), and has little in 
the way of executable code right now.

I recommend looking, in particular, at the pod in these files: 
Rosetta.pm, Model.pm, Language.pod, Overview.pod, TODO.pod.

Over the next month or so, hopefully coinciding with the Pugs 6.28.0 
release (that is refactored over the new PIL2 and perl 6 object 
model), I should have more code such that you can actually start 
playing with Rosetta in your code.

I welcome any kind of assistence that you can provide with Rosetta, 
and I hope that it will have a huge positive impact on the community. 
Really, assistence would be appreciated.

Thank you and have a good day. -- Darren Duncan