[Chicago-talk] recoverable (human-readable) persistent data stores?

Sun Jul 9 07:20:20 PDT 2006

I hate XML, but this is one thing that it's really good at. You can use the SAX parser 
to stream records into memory incrementally. Other than that you could use DBD::CSV.

On Sun, 09 Jul 2006 03:08:20 -0500
  Jonathan Rockway <jon at jrock.us> wrote:
> Does anyone have any suggestions for something database-like that stores
> itself in a human readable form?
> 
> I'm looking to replace YAML files (in my backup software, Chroniton*)
> that contain tens of thousands of entries like this:
> 
>>   /home/jon/tmp/.pm:
>>     - !!perl/hash:Chroniton::File
>>       location: /tmp/backups/backup_1152419747.22405
>>       metadata:
>>         atime: 1152420555
>>         attributes:
>>           user.testattribute: foo
>>           user.creation_time: 1152339615
>>         ctime: 1152339615
>>         gid: jon
>>         md5: c04b397efc6df812d0668d48b631e93b
>>         mtime: 1152339615
>>         permissions: -rw-r--r--
>>         size: 105
>>         uid: jon
>>       name: /home/jon/tmp/.pm
>>       type: file
>> 
>>    /home/jon/tmp/foo:
>>     - etc.
> 
> with something that I can load into memory incrementally, and then store
> back to disk incrementally (i.e., I only need one record in core at a
> time, but while it's in core it gets read and written).
> 
> I'd like to avoid a sqlite or berkeley database file, because if the
> file gets corrupted somehow, all the data tends to get lost.  (Ever move
> a bdb svn repository between machines?  It just doesn't work.)  I've
> also been burned a number of times with sqlite shared library updates
> losing my data.  Since the point of backup software is to be able to
> restore your machine when you hose it, I can't be dependent on having
> version 1.3.3.7_42 of some shared library around.
> 
> The other obvious option, using an individual file for each record, is
> both cumbersome and inefficient -- on my filesystem each file takes 4k
> (and I've configured systems where each file is 32M at a minimum!).
> 
>For the 54631 files in my ~/tmp directory (not really temporary files,
> btw) this would use 213M of disk at the very minimum.  That's 10%
> overhead, and isn't acceptable :)  (The compressed YAML only takes up 1.9M!)
> 
> BTW, reading in the whole file and delete-ing hash keys frees up memory
> according to Devel::Size, but the perl process' memory footprint never
> shrinks.
> 
> With these restrictions in place, I'm kind of out of ideas, so any
> insight would be greatly appreciated.  Thanks!
> 
> Regards,
> Jonathan Rockway
> 
> * GPLd and available from CPAN or http://www.jrock.us/trac/chroniton
> 

JT ~ Plain Black
ph: 703-286-2525 ext. 810
fax: 312-264-5382
http://www.plainblack.com

I reject your reality, and substitute my own. ~ Adam Savage