[Melbourne-pm] Designing modules to handle large data files
Daniel Pittman
daniel at rimspace.net
Mon Aug 23 01:48:31 PDT 2010
Toby Corkindale <toby.corkindale at strategicdata.com.au> writes:
> On 23/08/10 11:14, Sam Watkins wrote:
>
>> I think if you have datasets that are smaller than your RAM, and you don't
>> create too many unnecessary perl strings and objects, you should be able to
>> process everything in perl if you prefer to do it like that. It may even
>> outperform a general relational database.
>
> Outperform, yes, but it won't scale well at all.
*nod* Everything is easy, and every algorithm is sufficient, for data smaller
than core memory. Given that 24 to 96 GB of memory is possible for a
dedicated home user today, that makes a lot of the old scaling problems go
away.
(Don't forget persistence, and hardware contention, though :)
[...]
>> You will also need to create indexes of course (perl hash tables). If you are
>> really running out of RAM, you could compress objects using Compress::Zlib or
>> similar - or buy some more RAM!
>
> Or you could use a lightweight db or NoSQL system, which has already
> implemented those features for you. Perhaps MongoDB or CouchDB would suit
> you?
For something like this I would also seriously consider Riak; the main
differences between Riak and the MongoDB/CouchDB models are in how they scale
across systems. (Internal, invisible sharding vs replication, basically.)
They all use JavaScript based map/reduce as their inherent data mining tools,
and can generally deliver reasonably on exploiting data locally and the like.
Daniel
--
✣ Daniel Pittman ✉ daniel at rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
More information about the Melbourne-pm
mailing list