[Melbourne-pm] Designing modules to handle large data files

Mon Aug 23 21:54:16 PDT 2010

On Mon, Aug 23, 2010 at 11:49:17AM +1000, Toby Corkindale wrote:
> On 23/08/10 11:14, Sam Watkins wrote:
>> I think if you have datasets that are smaller than your RAM, and you don't
>> create too many unnecessary perl strings and objects, you should be able to
>> process everything in perl if you prefer to do it like that.  It may even
>> outperform a general relational database.
>
> Outperform, yes, but it won't scale well at all.

True, I guess it depends whether your database is growing faster than Moore's
law.  I could keep some basic data on 100 million users all in RAM on my 2GB
laptop.  (name, email, DOB, password).  Is the dataset bigger than that?

> Or you could use a lightweight db or NoSQL system, which has already  
> implemented those features for you.
> Perhaps MongoDB or CouchDB would suit you?

Speaking of 'NoSQL' has anyone used the 'nosql' package in Debian?
It provides a TSV based RDB system based on pipes and processors (unix-style
tools).  I really like this approach and prefer it compared to SQL databases.

You can do nice unixy things with this sort of textual database, such as diff
  <(sort db1/table1) <(sort db2/table2)

Sam