[Melbourne-pm] Designing modules to handle large data files
toby.corkindale at strategicdata.com.au
Thu Aug 19 00:15:22 PDT 2010
On 19/08/10 16:52, Tulloh, David wrote:
> Dear List,
> As part of my work I have built several modules to handle data files.
> The idea is to hide the structure and messiness of the data file in a
> nice reusable module. This also allows the script to focus on the
> processing rather than the data format.
> Unfortunately while the method I have evolved towards meets these
> objectives reasonably well I'm running into significant memory and speed
> problems with large data files. I have some ideas of ways to
> restructure it to improve this but all involve some uncomfortable
> I was hoping some of the more experienced eyes on the list could look
> over my approach and make a few suggestions.
Perhaps you should import the data file into a database, then let the
database do all the hard work for you? By all means put a layer over the
DB interface so as to make it nice for people to use.
You are running the risk of reinventing the wheel otherwise.
If you want to stick with processing the file in situ, then you'll need
to approach it with a streaming processor, rather than loading the whole
thing into memory at once.
Are you familiar with that concept?
More information about the Melbourne-pm