[tpm] Bigtable (Hypertable)

Fri May 29 09:17:46 PDT 2009

Last night during our discussion, Abuzar brought up Google's Bigtable
database, and the open source Hypertable implementation.  I just wanted to
follow up on that discussion, as I think anyone who is working with large
data-sets may benefit from it.

With regards to the tablet implementation - on the surface, its usage looks
similar to having multiple tables within a regular RDBMS, but instead of
each table within the RDBMS being somewhat "disjointed" from each other
(requiring you to "join" rows from multiple tables to get a unique
data-set), Hypertable (BigTable) just has each of these tables as sub-tables
- part of a bigger main table.

Here's the link to the Google research publication regarding BigTable:
http://labs.google.com/papers/bigtable.html
And Wikipedia's article on BigTable: http://en.wikipedia.org/wiki/BigTable

>From what I can see, the benefits include the ability to store extremely
large sets of data into a single table, across a distributed file-system
(such as GFS), along with having version control "built-in" via another
table dimension.  All this seems that it is accomplished without the need
for a database server of any kind (though I'm not sure about this).  Any
performance increase, it seems, comes from requiring the client application
to cache references to sub-tables within a bigger table.

If anyone has a better understanding of how this works, please share!

-- 
J. Bobby Lopez
Web: http://jbldata.com/
Twitter: http://www.twitter.com/jbobbylopez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090529/ddce235e/attachment.html>