[Melbourne-pm] Designing modules to handle large data files
sam at nipl.net
Mon Aug 23 21:54:16 PDT 2010
On Mon, Aug 23, 2010 at 11:49:17AM +1000, Toby Corkindale wrote:
> On 23/08/10 11:14, Sam Watkins wrote:
>> I think if you have datasets that are smaller than your RAM, and you don't
>> create too many unnecessary perl strings and objects, you should be able to
>> process everything in perl if you prefer to do it like that. It may even
>> outperform a general relational database.
> Outperform, yes, but it won't scale well at all.
True, I guess it depends whether your database is growing faster than Moore's
law. I could keep some basic data on 100 million users all in RAM on my 2GB
laptop. (name, email, DOB, password). Is the dataset bigger than that?
> Or you could use a lightweight db or NoSQL system, which has already
> implemented those features for you.
> Perhaps MongoDB or CouchDB would suit you?
Speaking of 'NoSQL' has anyone used the 'nosql' package in Debian?
It provides a TSV based RDB system based on pipes and processors (unix-style
tools). I really like this approach and prefer it compared to SQL databases.
You can do nice unixy things with this sort of textual database, such as diff
<(sort db1/table1) <(sort db2/table2)
More information about the Melbourne-pm