[Omaha.pm] Embeddable database options.

Mario Steele mario at ruby-im.net
Mon Aug 24 14:11:31 PDT 2009


Hello Dan,

On Mon, Aug 24, 2009 at 10:17 AM, Dan Linder <dan at linder.org> wrote:

> Guys,
>
> I'm looking at rewriting some of the store/retrieve code in a project
> I'm working on.  The current method uses the Data::Dumper and eval()
> code to store data to a hierarchical directory structure on disk.
> Over the weekend I all but eliminated the hard-disk overhead by moving
> the data to a temporary RAM disk -- sadly, the speed-ups were too
> small to notice.  This tells me that the overall Linux file-system
> caching is working quite well.  (Yay!) Unfortunately, this leads me
> (again) determine that the Dumper/eval() code is probably the
> bottle-neck.  (Definately not what they were designed for, but work
> remarkably well none the less...)


Eval is more then likely your biggest bottleneck.  Dumper not so much, but
heavy usage of eval in any language, can create a bottleneck in nothing
flat.


> So, I started investigating alternatives:
>  * A true database with client/server model (i.e. MySQL, PostgreSQL, etc)


Use MySQL / PostgreSQL is you are going to have many hits to the Perl script
that is going to be executing.  It does well with threading, and also solves
the problem mentioned below about SQLite.

 * An embedded database such as SQLite (others?)


SQLite is a great Database system, for File Based data storage.
Unfortunately, it stores in binary, so you can't exactly use grep, vi, etc,
etc, to read the contents of the database file.  But unlike it's big
brother, you can only have one transactional lock (EG Database Open) at a
time on a database file.  This is to prevent corruption of the data.  (And
yes, this locks even if your just doing a read query.)


> * Continue using the filesystem+directory structure using
> freeze()/thaw() from the FreezeThaw CPAN module (speed improvement?)


I dunno if freeze()/thaw() will do any good, as it still comes down to
Dumper/eval() to properly store the information.


> * Use a DBD module to store/retrieve these files (i.e. DBD::File,
> DBD::CSV, etc) (benefit here is that a simple change in the DB setup
> code will mean a change from DBD::File to DBD::SQLite or
> DBD::PostgreSQL should be fairly short work)


DBD overall, is a great front end for you to use, for database storage, as
it gives you a common api across many different DB Backends.  If you want
consistency, and the ability to test different database storage engines,
then I would strongly recommend you use DBD.


> Internally I have some constraints:
>  * We'd like to keep the number of non-core Perl modules down
> (currently we're 90% core), and a couple customers are extremely
> sensitive to anything that is not supplied by their OS provider
> (Solaris and HPUX for example).


This is true in many facets, but you'll find standard that MySQL and SQLite
are often the biggest thing that is distributed on most Operating Systems
(Aside from Windows, but we won't go there).

* We would also like to keep the files on disk and in a
> human-readable form so the end users and support staff can peruse this
> data with simple tools (grep, vi, etc).


Again, as stated above, SQLite, and MySQL won't let you use grep, vi, etc,
to view the data, but simple tools can be created to create the same effect,
and highly optimize it to specific tasks, instead of looking through
hundreds of lines of data, to find a specific field.


>  * The remaining 10% that is non-core Perl modules are local copies of
> "pure perl" CPAN modules we've merged into the source code branch
> directly.  (We do this because the code runs on Solaris/SPARC,
> Solaris/x86_64, Linux/x86, Linux/ia64, HPUX/PA-RISC, HPUX/ia64, etc)
>
> My personal pick at the moment is SQLite (it is provided natively in
> Solaris 10, and easy to install on Linux platforms), but I question if
> the speed up it provides will be over-shadowed by the constant
> spawning of the sqlite binary each time an element of data is queried.
>  (Anyone know if there is a way to leave a persistent copy of SQLite
> running in memory that future copies hook into?  Getting a bit far
> afield from the initial SQLite implementation goals...)


Now, I come to this, after explaining the above to you, and I will be
directly to the point.  SQLite Binary (or BLOB) data types, while may seem
to be huge for data allocation and stuff, is actually quite minimal in
overall speed.  This especially can be optimized when you need to look at
specific data fields, and could care less about the rest.  As well with
anything else, SQLite does have overheads, but not nearly as much as you
might think.  It only allocates the data needed to return the results of a
SQL query, or insert data into the database.

The SQLite team has put much effort into optimizing the SQLite engine, so
that it can store, as well as retrieve data in the most efficient manner
possible, and keep the engine fast, and properly working.  Many Linux
distributions (Ubuntu among most), use SQLite for a large amount of storage
within their own system, such as APT/Aptitude/Synaptic.  Using SQLite can
have it's advantages, but also it's downfalls to.  If your wanting to avoid
database locking issues, then I suggest MySQL.  If your looking for Light
weight solution, that is quick, and not so much a worry about Locking
issues, then I would suggest SQLite.


> Thanks for any insight,
>
> DanL
>
> --
> ******************* ***************** ************* ***********
> ******* ***** *** **
> "Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
> the Satires of Juvenal
> "I do not fear computers, I fear the lack of them." -- Isaac Asimov
> (Author)
> ** *** ***** ******* *********** ************* *****************
> *******************
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm
>

Hope this helps, and it is just my own two cents on the deal.

-- 
Mario Steele
http://www.trilake.net
http://www.ruby-im.net
http://rubyforge.org/projects/wxruby/
http://rubyforge.org/projects/wxride/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/omaha-pm/attachments/20090824/673c10bc/attachment.html>


More information about the Omaha-pm mailing list