[VPM] Perl Mongers Meeting November 21

Mon Nov 13 18:08:47 PST 2006

Victoria.pm will meet at its regular date, time, and place at 7:00 pm 
on Tuesday, November 21, at UVic in ECS (Engineering Computer Science 
building) room 660 (see http://www.uvic.ca/maps/index.html).  (There 
will be no December meeting.  In January we will start visiting the 
beginning-level topics that I received requests for.)

Darren Duncan will present a new homegrown RDBMS that is written in 
Perl, including and overview of its features and design, featuring a 
walkthrough of what implementation code exists so far, and also 
hopefully showing a live demo of it in action, if it is far enough 
along for that by the meeting date.

This RDBMS is intended to be a complete implementation of a truly 
relational DBMS, as introduced by Edgar F. Codd and maintained by Hugh 
Darwen and Chris Date.  The project is fairly unique and has a 
significant number of differences from SQL DBMSs, which are not truly 
relational due to either missing some required features or due to 
providing certain mis-features.

Note: Some useful background material can be seen at 
http://en.wikipedia.org/wiki/Relational_data_model and 
http://en.wikipedia.org/wiki/Relational_algebra .

Darren's RDBMS has a Perl 5 implementation named "QDRDBMS", which is 
what will be walked through, and a Perl 6 implementation under some 
other name will be made shortly thereafter or in parallel; QDRDBMS is 
officially a prototype that is deprecated in favor of the longer-term 
Perl 6 product.

Some key differences between QDRDBMS and a SQL DBMS are:

  - QDRDBMS uses the terminology "relation value" (or "relation"), 
"relation variable" (or "relvar"), "tuple" and "attribute" for concepts 
analagous to SQL's terminology "rowset", "table", "row" and 
"column".  A "base relvar" and a "virtual relvar" corresponds to a 
"table" and a "view"; "relvar" refers to both.    A relational database 
is a collection of relvars, analagous to a SQL database being a 
collection of tables.  An "update" operation includes not only 
assignment but SQL's concept of "insert", "update", "delete", etc.
  - QDRDBMS uses 2-valued logic rather than 3+-valued logic like SQL; 
QDRDBMS does not have a native "null" concept.  There are multiple 
better alternatives to express "unknown" or "not applicable" etc with 
it.  In SQL terms, all relation attributes are implicitly not nullable.
  - QDRDBMS always uses exceptions to indicate failure, whether due to 
attempting a constraint violation or by trying to divide by zero; these 
can be explicitly trapped and responded to as the user chooses. SQL 
sometimes uses exceptions to indicate failure, and other times it just 
returns null values instead.
  - QDRDBMS is based on sets, rather than bags as SQL is; a tuple is an 
unordered set of attributes, and a relation is an unordered set of tuples.
  - All QDRDBMS attributes are named and must have distinct names; SQL 
columns are ordered, do not have to be named or can have duplicated names.
  - In SQL terms, a relation has an implicit unique key constraint over 
all of its attributes, if it doesn't have an explicit key defined.  A 
truly relational database is always in at least first normal form by 
definition.
  - A QDRDBMS relation or tuple can validly have zero attributes while 
a SQL table or row must always have at least one column.
  - A QDRDBMS relation has no hidden attributes like SQL's "row id"; 
they only contain the attributes that the user defined for them.  But 
since relations contain no duplicates, you can reliably address their 
elements by just their attribute values.
  - All QDRDBMS operations involving relations are set-based; there are 
no tuple-at-a-time operations like SQL's current-row cursors. (That's 
not to say that the database-application gateway can't stream data.)
  - QDRDBMS queries are composed of simpler relational operators 
composed into arbitrary user-defined expressions, rather than the 
monolithic and complicated SQL "select" operators.
  - QDRDBMS users have the same kind of flexability with relational 
expressions as they do with string or numeric or logical etc 
expressions; SQL users typically have different syntax structures for 
table or rowset operations than for other data types.
  - Each QDRDBMS relational operator returns only distinct results (as 
a relation value); the SQL relational operators are inconsistent, with 
some returning distinct results by default (eg, "union"), while others 
return duplicated results by default (eg, "join").  Note that list 
summarizing operations like "sum" and "average" still do the correct 
thing, despite the aforementioned being true for list-returning operators.
  - QDRDBMS is strongly typed, with all values and variables being of a 
specific type, and operators taking and returning values of specific 
types.  No implicit type casting is done, and all type conversions must 
be done explicitly.  How strong or weak data types are varies by the 
SQL DBMS product, and many of them do implicit casting.
  - QDRDBMS has distinct numerical division operators for integers and 
fractional numbers, which return integers or fractional numbers 
respectively.  By contrast, some SQL DBMSs have just one behaviour, and 
others will change behaviour semi-arbitrarily depending on what a 
loosely typed input looks like; eg, using the number "3.0" may not do 
what users expect.
  - QDRDBMS data types can be scalar or collection types, arbitrarily 
complex; relation or tuple attributes can be composed of either, and 
attributes can even be relations or tuples.  SQL DBMSs often don't let 
you use collection types or tables for table columns, or their 
provision is inconsistent.
  - QDRDBMS uses the same relational operators for both database 
relations and component relations of other data types.  SQL requires 
you to use different operators for each though they are alike; eg, 
"union" vs "multiset union".
  - QDRDBMS empowers users to define their own data types and 
operators, while only some SQL DBMSs do (as "domains" or "types" or 
"stored procedures" etc).
  - QDRDBMS operators can be recursive, while only some SQL DBMSs allow this.
  - QDRDBMS operators take named arguments rather than positional one, 
so there is not only flexability but also consistency between those and 
the way relation attributes are used, which is by name.
  - Every QDRDBMS data type must have at least equality, inequality 
testing and value assignment operators, even if it has no other 
operators.  SQL does not require data types to have equality test 
operators or be assignable.
  - Fundamentally, the only QDRDBMS update operator is the assignment 
operator, and any other update operators (eg, "INSERT", "UPDATE", 
"DELETE") are just short-hand for particular assignment expressions. 
SQL doesn't provide assignment syntax for many of its update operators.
  - All QDRDBMS data types are immutable; you only update variables as 
a whole, sometimes by assigning a new value derived from the old value 
but for the intended mutation.  Short-hand operators can make it appear 
different though.
  - All QDRDBMS database constraints are immediate, and are applied 
between statement boundaries at all levels of granularity.  A SQL DBMS 
constraint can be defined as deferrable, such as only being applied at 
a transaction commit time.  So QDRDBMS guarantees that no query or 
expression will ever see a database that is in an inconsistent / 
constraint-violating state.
  - QDRDBMS empowers you to perform multiple variable assignments 
simultaneously (which is syntactic sugar for a single assignment to the 
database as a whole), so a multi-part change can be made without 
tripping the immediate constraints.  So it is possible to, eg, record a 
credit to one bank account and a corresponding debit to another, 
without the database being in an inconsistent state "between" those two 
actions.
  - Where at all possible, there is no distinction between base and 
virtual relvars from the database user's point of view; they can query 
and assign to / update either.  The database can be redefined so that 
some base and virtual relvars become virtual and base, and the user can 
continue like nothing had changed.
  - QDRDBMS provides system catalog relvars sufficient to completely 
describe the entire database.  This is analagous to the "information 
schema" concept of SQL, which some SQL DBMSs provide and others 
don't.  The QDRDBMS catalog is AST-based, though, in that the various 
details are atomic, rather than consisting of literal SQL strings for 
many parts (eg, procedure definitions).
  - The catalog relvars of QDRDBMS are user-updateable; where by 
contrast they are read-only in SQL.  In fact, updating the QRDBMS 
system catalog using DML operations is the fundamental means to perform 
data definition or schema changes to a database.  While QDRDBMS 
provides short-hands to this, analagous to SQL's "create table" or 
"alter table" etc, they are short-hand.  SQL's "create" etc statements 
are the only normal way to do schema changes, and they are a lot less flexible.
  - QDRDBMS supports arbitrary depth child transactions, where any 
statements within a transaction level are collectively atomic and can 
succeed or fail.  Any statement block marked as atomic, and all named 
routines and try-catch blocks are atomic; in the last case, a thrown 
exception indicates a failure of the block.  Individual statements are 
implicitly the trivial case of a transaction.
  - QDRDBMS is implicitly in auto-commit mode by default, where each 
statement commits immediately on success.  Defining an explicit 
transaction is effectively just making a larger atomic statement. This 
is the most consistent way of doing things when you support child 
transactions.  Some SQL DBMSs are this way too by default, while others 
require explicit "commit" statements for anything to persist even when 
no explicit transaction is started.
  - In QDRDBMS, all operations are subject to transactions, including 
updates of the schema itself, and can be rolled back.  Some SQL DBMSs 
(such as MySQL) implicitly commit certain operations even if in an 
explicit transaction.
  - The native query language of QDRDBMS is an AST, though a 
string-query allowing wrapper is provided too, so users don't have to 
generate query strings like with SQL, and they don't have to worry 
about escaping, so there are no injection vulnerabilities with the AST 
like with SQL.
  - The QDRDBMS API is designed to be easily wrappable with alternate 
or simplified interfaces that the users choose.  It is a lot harder to 
wrap a SQL DBMS.
  - QDRDBMS is fully ACID compliant, while some SQL DBMSs are not.

Some key advantages of QDRDBMS over a SQL DBMS are:

  - Better language consistency.
  - Language ambiguity is removed.
  - Language is better huffman coded.
  - Language is a lot more flexible and capable.
  - Different queries that are logically identical will return the same result.
  - Queries are easier for the DBMS to optimize.
  - The product is orders of magnitude easier to implement, providing 
more functionality and reliability with a smaller footprint and less 
complexity.
  - There is no object/relational impedence mismatch; object-oriented 
concepts can be effectively stored in a relational database with little 
to no fuss.
  - You can emulate any SQL dialect or SQL features over the QDRDBMS 
API, so it is easy to port schemas and applications over, and you can 
use the QDRDBMS API as an intermediary for translating from one query 
language to another if you desire; either way, QDRDBMS helps you avoid 
database vendor lock-in.

If possible, there will be a demo of a simple application that uses 
QDRDBMS at the meeting, probably a simple command-line genealogy program.

You can see the QDRDBMS code as it is being developed by looking in 
http://darrenduncan.net/QDRDBMS/ ; very few pieces are there as yet, 
but more will be added over time, or existing ones changed.  Once 
QDRDBMS can actually do something useful, and it can start being used 
in production, the first version will be uploaded to CPAN as 
QDRDBMS-0.001, and put it in a shared version control system too. 
Darren also start helping other Perl frameworks on CPAN adapt to 
support QDRDBMS for their users, probably starting with DBIx::Class and 
going from there.

Questions can be asked at any time during the talk, and the talk can be 
customized to things that attendees want to focus on.

(Courtesy copy to VLUG and VOSSOC members by permission of the list 
managers.  Victoria.pm's home page is <http://victoria.pm.org/>.)
-- 
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com/
http://www.perlmedic.com/