[Chicago-talk] recoverable (human-readable) persistent data stores?

Jonathan Rockway jon at jrock.us
Sun Jul 9 01:08:20 PDT 2006


Does anyone have any suggestions for something database-like that stores
itself in a human readable form?

I'm looking to replace YAML files (in my backup software, Chroniton*)
that contain tens of thousands of entries like this:

>   /home/jon/tmp/.pm:
>     - !!perl/hash:Chroniton::File
>       location: /tmp/backups/backup_1152419747.22405
>       metadata:
>         atime: 1152420555
>         attributes:
>           user.testattribute: foo
>           user.creation_time: 1152339615
>         ctime: 1152339615
>         gid: jon
>         md5: c04b397efc6df812d0668d48b631e93b
>         mtime: 1152339615
>         permissions: -rw-r--r--
>         size: 105
>         uid: jon
>       name: /home/jon/tmp/.pm
>       type: file
> 
>    /home/jon/tmp/foo:
>     - etc.

with something that I can load into memory incrementally, and then store
back to disk incrementally (i.e., I only need one record in core at a
time, but while it's in core it gets read and written).

I'd like to avoid a sqlite or berkeley database file, because if the
file gets corrupted somehow, all the data tends to get lost.  (Ever move
a bdb svn repository between machines?  It just doesn't work.)  I've
also been burned a number of times with sqlite shared library updates
losing my data.  Since the point of backup software is to be able to
restore your machine when you hose it, I can't be dependent on having
version 1.3.3.7_42 of some shared library around.

The other obvious option, using an individual file for each record, is
both cumbersome and inefficient -- on my filesystem each file takes 4k
(and I've configured systems where each file is 32M at a minimum!).

For the 54631 files in my ~/tmp directory (not really temporary files,
btw) this would use 213M of disk at the very minimum.  That's 10%
overhead, and isn't acceptable :)  (The compressed YAML only takes up 1.9M!)

BTW, reading in the whole file and delete-ing hash keys frees up memory
according to Devel::Size, but the perl process' memory footprint never
shrinks.

With these restrictions in place, I'm kind of out of ideas, so any
insight would be greatly appreciated.  Thanks!

Regards,
Jonathan Rockway

* GPLd and available from CPAN or http://www.jrock.us/trac/chroniton

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 370 bytes
Desc: OpenPGP digital signature
Url : http://mail.pm.org/pipermail/chicago-talk/attachments/20060709/a0ffc253/attachment.bin 


More information about the Chicago-talk mailing list