[VPM] RFC: 2-part Nov tutorial on how to make an RDBMS from scratch

Darren Duncan darren at DarrenDuncan.net
Sun Oct 8 21:10:08 PDT 2006


I may have hinted at this before, but now to be slightly more formal ...

I propose to give a 2-part tutorial in November of 2006, targeted at 
a general purpose programmer, on how to make an RDBMS "from scratch".

Even if you don't intend to make the next Postgres or SQLite or 
whatever, knowing how an RDBMS works can help you do some kinds of 
data manipulation within your own programs that you may have pushed 
off to a SQL database before, for unnecessary complexity, or this 
knowledge can help you to use an existing RDBMS more effectively if 
you better understand what your queries mean.

Part 1 of the tutorial would cover fundamental RDBMS design 
principles in a programming language agnostic fashion, and I 
anticipate it would best be given at the Nov 7th RECCOMPSCI meeting 
(probably better than an Oct 31st VOSSOC).

This part would start by outlining what a relational database 
management system is (a usually-persistent collection of 
"relation"-typed variables plus operators to work with them), and 
show an example library/module API plus query language design and 
algorithms for effectively and easily implementing such.

Note that the vast bulk of conventional SQL DBMS systems is concerned 
with issues that are orthogonal to the fundamentals of an RDBMS 
itself, such as handling many users with concurrent updates in a 
quickly performing fashion for huge amounts of data, or supporting a 
hundred different legacy ways to do the same thing, or in trying to 
deal with the vaguarities of SQL.

By contrast, you can make a basic easy to use single-concurrent-user 
system of your own that has all the important features of an RDBMS, 
and handles smaller amounts of data, and has correct semantics, 
without spending more than a few days or weeks of effort.  This 
system would natively handle arbitrarily complex data types, and in a 
large sense has no OO/RDBMS impedence mismatch.

Fundamentally, you can do this with any turing complete programming 
language, though some will lend themselves to the task of doing this 
quickly and easily more than others.  Language features that will 
help you include built-in support for collection types, particularly 
associative arrays and sets, N-depth multi-dimensional data 
structures, dynamically definable data types, closures / anonymous 
subroutines, the ability to evaluate/compile a character string as 
code at run time, nested software transactional memory, etc.

Part 2 of the tutorial would walk through an actual working 
implementation of those principles that I would have made (and that 
would be active in a production system) by then (it is currently in 
progress), showing how it works; the implementation would be around 
100-300KB of Perl 5 code, with next to no dependencies, and be 
released on CPAN under the name QDRDBMS (Quick and Dirty Relational 
Database Management System), and I anticipate it would best be given 
at the Nov 21st Victoria.pm meeting.

Regardless of the actual date this is done, the design part should be 
done before the example implementation part (unlike how the recent 
RDF 2-parter happened) so that everything flows better.

For either part, I can compact or expand my talk to whatever time is 
available, so the meetings can still be split with other speakers 
like some past meetings, though I could probably fill the time as 
well.

Note that I most likely won't have a slide presentation made up; the 
live tutorial will be based on talking and writing diagrams on the 
white/blackboard.

That said, if this talk proves to be helpful to people, I will 
probably evolve it (part 1, mainly) into a more formal 45-minute talk 
with slides, and propose it for conferences like the 2007 OSCON or 
YAPC NA or whatever (possibly with a working Perl 6 implementation by 
then), and I can practice the more shaken-out versions in mid-2007 
for you first, as well.

Any feedback on this proposal is appreciated.

Thank you. -- Darren Duncan

P.S.  I now have no proposals for the October Victoria.pm meeting.


More information about the Victoria-pm mailing list