[VPM] Perl Mongers Meeting tomorro
Peter Scott
Peter at PSDT.com
Mon Nov 20 06:09:00 PST 2006
Victoria.pm will meet at its regular date, time, and place tomorrow at
7:00 pm on Tuesday, November 21, at UVic in ECS (Engineering Computer
Science building) room 660 (see
http://www.uvic.ca/maps/index.html). (There will be no December
meeting. In January we will start visiting the beginning-level topics
that I received requests for.)
Darren Duncan will present a new homegrown RDBMS that is written in
Perl, including and overview of its features and design, featuring a
walkthrough of what implementation code exists so far, and also
hopefully showing a live demo of it in action, if it is far enough
along for that by the meeting date.
This RDBMS is intended to be a complete implementation of a truly
relational DBMS, as introduced by Edgar F. Codd and maintained by Hugh
Darwen and Chris Date. The project is fairly unique and has a
significant number of differences from SQL DBMSs, which are not truly
relational due to either missing some required features or due to
providing certain mis-features.
Note: Some useful background material can be seen at
http://en.wikipedia.org/wiki/Relational_data_model and
http://en.wikipedia.org/wiki/Relational_algebra .
Darren's RDBMS has a Perl 5 implementation named "QDRDBMS", which is
what will be walked through, and a Perl 6 implementation under some
other name will be made shortly thereafter or in parallel; QDRDBMS is
officially a prototype that is deprecated in favor of the longer-term
Perl 6 product.
Some key differences between QDRDBMS and a SQL DBMS are:
- QDRDBMS uses the terminology "relation value" (or "relation"),
"relation variable" (or "relvar"), "tuple" and "attribute" for concepts
analagous to SQL's terminology "rowset", "table", "row" and
"column". A "base relvar" and a "virtual relvar" corresponds to a
"table" and a "view"; "relvar" refers to both. A relational database
is a collection of relvars, analagous to a SQL database being a
collection of tables. An "update" operation includes not only
assignment but SQL's concept of "insert", "update", "delete", etc.
- QDRDBMS uses 2-valued logic rather than 3+-valued logic like SQL;
QDRDBMS does not have a native "null" concept. There are multiple
better alternatives to express "unknown" or "not applicable" etc with
it. In SQL terms, all relation attributes are implicitly not nullable.
- QDRDBMS always uses exceptions to indicate failure, whether due to
attempting a constraint violation or by trying to divide by zero; these
can be explicitly trapped and responded to as the user chooses. SQL
sometimes uses exceptions to indicate failure, and other times it just
returns null values instead.
- QDRDBMS is based on sets, rather than bags as SQL is; a tuple is an
unordered set of attributes, and a relation is an unordered set of tuples.
- All QDRDBMS attributes are named and must have distinct names; SQL
columns are ordered, do not have to be named or can have duplicated names.
- In SQL terms, a relation has an implicit unique key constraint over
all of its attributes, if it doesn't have an explicit key defined. A
truly relational database is always in at least first normal form by
definition.
- A QDRDBMS relation or tuple can validly have zero attributes while
a SQL table or row must always have at least one column.
- A QDRDBMS relation has no hidden attributes like SQL's "row id";
they only contain the attributes that the user defined for them. But
since relations contain no duplicates, you can reliably address their
elements by just their attribute values.
- All QDRDBMS operations involving relations are set-based; there are
no tuple-at-a-time operations like SQL's current-row cursors. (That's
not to say that the database-application gateway can't stream data.)
- QDRDBMS queries are composed of simpler relational operators
composed into arbitrary user-defined expressions, rather than the
monolithic and complicated SQL "select" operators.
- QDRDBMS users have the same kind of flexability with relational
expressions as they do with string or numeric or logical etc
expressions; SQL users typically have different syntax structures for
table or rowset operations than for other data types.
- Each QDRDBMS relational operator returns only distinct results (as
a relation value); the SQL relational operators are inconsistent, with
some returning distinct results by default (eg, "union"), while others
return duplicated results by default (eg, "join"). Note that list
summarizing operations like "sum" and "average" still do the correct
thing, despite the aforementioned being true for list-returning operators.
- QDRDBMS is strongly typed, with all values and variables being of a
specific type, and operators taking and returning values of specific
types. No implicit type casting is done, and all type conversions must
be done explicitly. How strong or weak data types are varies by the
SQL DBMS product, and many of them do implicit casting.
- QDRDBMS has distinct numerical division operators for integers and
fractional numbers, which return integers or fractional numbers
respectively. By contrast, some SQL DBMSs have just one behaviour, and
others will change behaviour semi-arbitrarily depending on what a
loosely typed input looks like; eg, using the number "3.0" may not do
what users expect.
- QDRDBMS data types can be scalar or collection types, arbitrarily
complex; relation or tuple attributes can be composed of either, and
attributes can even be relations or tuples. SQL DBMSs often don't let
you use collection types or tables for table columns, or their
provision is inconsistent.
- QDRDBMS uses the same relational operators for both database
relations and component relations of other data types. SQL requires
you to use different operators for each though they are alike; eg,
"union" vs "multiset union".
- QDRDBMS empowers users to define their own data types and
operators, while only some SQL DBMSs do (as "domains" or "types" or
"stored procedures" etc).
- QDRDBMS operators can be recursive, while only some SQL DBMSs allow this.
- QDRDBMS operators take named arguments rather than positional one,
so there is not only flexability but also consistency between those and
the way relation attributes are used, which is by name.
- Every QDRDBMS data type must have at least equality, inequality
testing and value assignment operators, even if it has no other
operators. SQL does not require data types to have equality test
operators or be assignable.
- Fundamentally, the only QDRDBMS update operator is the assignment
operator, and any other update operators (eg, "INSERT", "UPDATE",
"DELETE") are just short-hand for particular assignment expressions.
SQL doesn't provide assignment syntax for many of its update operators.
- All QDRDBMS data types are immutable; you only update variables as
a whole, sometimes by assigning a new value derived from the old value
but for the intended mutation. Short-hand operators can make it appear
different though.
- All QDRDBMS database constraints are immediate, and are applied
between statement boundaries at all levels of granularity. A SQL DBMS
constraint can be defined as deferrable, such as only being applied at
a transaction commit time. So QDRDBMS guarantees that no query or
expression will ever see a database that is in an inconsistent /
constraint-violating state.
- QDRDBMS empowers you to perform multiple variable assignments
simultaneously (which is syntactic sugar for a single assignment to the
database as a whole), so a multi-part change can be made without
tripping the immediate constraints. So it is possible to, eg, record a
credit to one bank account and a corresponding debit to another,
without the database being in an inconsistent state "between" those two
actions.
- Where at all possible, there is no distinction between base and
virtual relvars from the database user's point of view; they can query
and assign to / update either. The database can be redefined so that
some base and virtual relvars become virtual and base, and the user can
continue like nothing had changed.
- QDRDBMS provides system catalog relvars sufficient to completely
describe the entire database. This is analagous to the "information
schema" concept of SQL, which some SQL DBMSs provide and others
don't. The QDRDBMS catalog is AST-based, though, in that the various
details are atomic, rather than consisting of literal SQL strings for
many parts (eg, procedure definitions).
- The catalog relvars of QDRDBMS are user-updateable; where by
contrast they are read-only in SQL. In fact, updating the QRDBMS
system catalog using DML operations is the fundamental means to perform
data definition or schema changes to a database. While QDRDBMS
provides short-hands to this, analagous to SQL's "create table" or
"alter table" etc, they are short-hand. SQL's "create" etc statements
are the only normal way to do schema changes, and they are a lot less flexible.
- QDRDBMS supports arbitrary depth child transactions, where any
statements within a transaction level are collectively atomic and can
succeed or fail. Any statement block marked as atomic, and all named
routines and try-catch blocks are atomic; in the last case, a thrown
exception indicates a failure of the block. Individual statements are
implicitly the trivial case of a transaction.
- QDRDBMS is implicitly in auto-commit mode by default, where each
statement commits immediately on success. Defining an explicit
transaction is effectively just making a larger atomic statement. This
is the most consistent way of doing things when you support child
transactions. Some SQL DBMSs are this way too by default, while others
require explicit "commit" statements for anything to persist even when
no explicit transaction is started.
- In QDRDBMS, all operations are subject to transactions, including
updates of the schema itself, and can be rolled back. Some SQL DBMSs
(such as MySQL) implicitly commit certain operations even if in an
explicit transaction.
- The native query language of QDRDBMS is an AST, though a
string-query allowing wrapper is provided too, so users don't have to
generate query strings like with SQL, and they don't have to worry
about escaping, so there are no injection vulnerabilities with the AST
like with SQL.
- The QDRDBMS API is designed to be easily wrappable with alternate
or simplified interfaces that the users choose. It is a lot harder to
wrap a SQL DBMS.
- QDRDBMS is fully ACID compliant, while some SQL DBMSs are not.
Some key advantages of QDRDBMS over a SQL DBMS are:
- Better language consistency.
- Language ambiguity is removed.
- Language is better huffman coded.
- Language is a lot more flexible and capable.
- Different queries that are logically identical will return the same result.
- Queries are easier for the DBMS to optimize.
- The product is orders of magnitude easier to implement, providing
more functionality and reliability with a smaller footprint and less
complexity.
- There is no object/relational impedence mismatch; object-oriented
concepts can be effectively stored in a relational database with little
to no fuss.
- You can emulate any SQL dialect or SQL features over the QDRDBMS
API, so it is easy to port schemas and applications over, and you can
use the QDRDBMS API as an intermediary for translating from one query
language to another if you desire; either way, QDRDBMS helps you avoid
database vendor lock-in.
If possible, there will be a demo of a simple application that uses
QDRDBMS at the meeting, probably a simple command-line genealogy program.
You can see the QDRDBMS code as it is being developed by looking in
http://darrenduncan.net/QDRDBMS/ ; very few pieces are there as yet,
but more will be added over time, or existing ones changed. Once
QDRDBMS can actually do something useful, and it can start being used
in production, the first version will be uploaded to CPAN as
QDRDBMS-0.001, and put it in a shared version control system too.
Darren also start helping other Perl frameworks on CPAN adapt to
support QDRDBMS for their users, probably starting with DBIx::Class and
going from there.
Questions can be asked at any time during the talk, and the talk can be
customized to things that attendees want to focus on.
(Courtesy copy to VLUG and VOSSOC members by permission of the list
managers. Victoria.pm's home page is <http://victoria.pm.org/>.)
--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com/
http://www.perlmedic.com/
More information about the Victoria-pm
mailing list