[Omaha.pm] Embeddable database options.

Wed Aug 26 17:29:35 PDT 2009

I use Sqlite for embeded but you could also look at FireBird. I know
of a cool healthcare project used by the med center that uses FireBird
as a local cache for a fat client.

2009/8/24 Mario Steele <mario at ruby-im.net>:
> Hello Dan,
>
> On Mon, Aug 24, 2009 at 10:17 AM, Dan Linder <dan at linder.org> wrote:
>>
>> Guys,
>>
>> I'm looking at rewriting some of the store/retrieve code in a project
>> I'm working on.  The current method uses the Data::Dumper and eval()
>> code to store data to a hierarchical directory structure on disk.
>> Over the weekend I all but eliminated the hard-disk overhead by moving
>> the data to a temporary RAM disk -- sadly, the speed-ups were too
>> small to notice.  This tells me that the overall Linux file-system
>> caching is working quite well.  (Yay!) Unfortunately, this leads me
>> (again) determine that the Dumper/eval() code is probably the
>> bottle-neck.  (Definately not what they were designed for, but work
>> remarkably well none the less...)
>
> Eval is more then likely your biggest bottleneck.  Dumper not so much, but
> heavy usage of eval in any language, can create a bottleneck in nothing
> flat.
>
>>
>> So, I started investigating alternatives:
>>  * A true database with client/server model (i.e. MySQL, PostgreSQL, etc)
>
> Use MySQL / PostgreSQL is you are going to have many hits to the Perl script
> that is going to be executing.  It does well with threading, and also solves
> the problem mentioned below about SQLite.
>
>>  * An embedded database such as SQLite (others?)
>
> SQLite is a great Database system, for File Based data storage.
> Unfortunately, it stores in binary, so you can't exactly use grep, vi, etc,
> etc, to read the contents of the database file.  But unlike it's big
> brother, you can only have one transactional lock (EG Database Open) at a
> time on a database file.  This is to prevent corruption of the data.  (And
> yes, this locks even if your just doing a read query.)
>
>>
>> * Continue using the filesystem+directory structure using
>> freeze()/thaw() from the FreezeThaw CPAN module (speed improvement?)
>
> I dunno if freeze()/thaw() will do any good, as it still comes down to
> Dumper/eval() to properly store the information.
>
>>
>> * Use a DBD module to store/retrieve these files (i.e. DBD::File,
>> DBD::CSV, etc) (benefit here is that a simple change in the DB setup
>> code will mean a change from DBD::File to DBD::SQLite or
>> DBD::PostgreSQL should be fairly short work)
>
> DBD overall, is a great front end for you to use, for database storage, as
> it gives you a common api across many different DB Backends.  If you want
> consistency, and the ability to test different database storage engines,
> then I would strongly recommend you use DBD.
>
>>
>> Internally I have some constraints:
>>  * We'd like to keep the number of non-core Perl modules down
>> (currently we're 90% core), and a couple customers are extremely
>> sensitive to anything that is not supplied by their OS provider
>> (Solaris and HPUX for example).
>
> This is true in many facets, but you'll find standard that MySQL and SQLite
> are often the biggest thing that is distributed on most Operating Systems
> (Aside from Windows, but we won't go there).
>
>> * We would also like to keep the files on disk and in a
>> human-readable form so the end users and support staff can peruse this
>> data with simple tools (grep, vi, etc).
>
> Again, as stated above, SQLite, and MySQL won't let you use grep, vi, etc,
> to view the data, but simple tools can be created to create the same effect,
> and highly optimize it to specific tasks, instead of looking through
> hundreds of lines of data, to find a specific field.
>
>>
>>  * The remaining 10% that is non-core Perl modules are local copies of
>> "pure perl" CPAN modules we've merged into the source code branch
>> directly.  (We do this because the code runs on Solaris/SPARC,
>> Solaris/x86_64, Linux/x86, Linux/ia64, HPUX/PA-RISC, HPUX/ia64, etc)
>>
>> My personal pick at the moment is SQLite (it is provided natively in
>> Solaris 10, and easy to install on Linux platforms), but I question if
>> the speed up it provides will be over-shadowed by the constant
>> spawning of the sqlite binary each time an element of data is queried.
>>  (Anyone know if there is a way to leave a persistent copy of SQLite
>> running in memory that future copies hook into?  Getting a bit far
>> afield from the initial SQLite implementation goals...)
>
> Now, I come to this, after explaining the above to you, and I will be
> directly to the point.  SQLite Binary (or BLOB) data types, while may seem
> to be huge for data allocation and stuff, is actually quite minimal in
> overall speed.  This especially can be optimized when you need to look at
> specific data fields, and could care less about the rest.  As well with
> anything else, SQLite does have overheads, but not nearly as much as you
> might think.  It only allocates the data needed to return the results of a
> SQL query, or insert data into the database.
>
> The SQLite team has put much effort into optimizing the SQLite engine, so
> that it can store, as well as retrieve data in the most efficient manner
> possible, and keep the engine fast, and properly working.  Many Linux
> distributions (Ubuntu among most), use SQLite for a large amount of storage
> within their own system, such as APT/Aptitude/Synaptic.  Using SQLite can
> have it's advantages, but also it's downfalls to.  If your wanting to avoid
> database locking issues, then I suggest MySQL.  If your looking for Light
> weight solution, that is quick, and not so much a worry about Locking
> issues, then I would suggest SQLite.
>
>>
>> Thanks for any insight,
>>
>> DanL
>>
>> --
>> ******************* ***************** ************* ***********
>> ******* ***** *** **
>> "Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
>> the Satires of Juvenal
>> "I do not fear computers, I fear the lack of them." -- Isaac Asimov
>> (Author)
>> ** *** ***** ******* *********** ************* *****************
>> *******************
>> _______________________________________________
>> Omaha-pm mailing list
>> Omaha-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/omaha-pm
>
> Hope this helps, and it is just my own two cents on the deal.
>
> --
> Mario Steele
> http://www.trilake.net
> http://www.ruby-im.net
> http://rubyforge.org/projects/wxruby/
> http://rubyforge.org/projects/wxride/
>
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm
>

-- 
Todd Christopher Hamilton
(402) 660-2787