[Melbourne-pm] Designing modules to handle large data files

Daniel Pittman daniel at rimspace.net
Mon Aug 23 23:05:38 PDT 2010


Sam Watkins <sam at nipl.net> writes:
> On Mon, Aug 23, 2010 at 12:41:02PM +1000, Adrian Masters wrote:
>> David,
>> 
>> [snip]
>> > Say for example you have 6,000,000 objects each with 10 fields.  I would store
>> > the objects on disk in the manner of Debian packages files:

[...]

>> > Text files, key-value pairs, records terminated with a blank line.
>> [snip]
>> 
>> If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML
>
> The format I'm suggesting is like YAML-lite, without the kitchen sink, as
> used in email and http headers.

Ah.  So, it is entirely insensitive to linear whitespace inline, are not
LWS-preserving, have a limit of 998 and 78 characters total and per-line,
possibly including or excluding LWS, in an implementation defined fashion,
have case-insensitive and ASCII-only keys, and contains only ASCII characters
without encoding in one of URL or RFC2047 MIME word format, then.

Right?

> The only addition over those is the blank-line as record separator.  It's
> the same as debian package files.

Once you add that it becomes clearer.  So, do you support the 'single period'
syntax for whitespace inside a line-folded record, and the optional non-folded
headers that Debian package control files do, or not?

[...]

> Other formats like XML and even YAML and JSON are unnecessarily
> over-complicated in my opinion.  Simplicity, Clarity, Generality!!

Sadly, without defining what you mean that very vague description doesn't
actually *specify* anything, just give a vague (and English/ASCII oriented)
hint in the general direction of what you were thinking.

Much as I hate, loath and detest much of the hype around it, the one thing
that XML got right (which, naturally, it inherited from SGML) is that it
actually specifies the details of how you process arbitrary data in that
format.

Most of the "simple" things either don't scale to cover the world, or don't
actually specify enough that you end up with crazy, crazy things.  (STOMP,
I am lookin' right at you, here.)

        Daniel

-- 
✣ Daniel Pittman            ✉ daniel at rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons


More information about the Melbourne-pm mailing list