[Melbourne-pm] Still performing well..

Toby Corkindale toby.corkindale at strategicdata.com.au
Wed May 19 20:06:44 PDT 2010


On 20/05/10 12:36, Sam Watkins wrote:
> On Wed, May 19, 2010 at 03:20:28PM +1000, Daniel Pittman wrote:
>> Well, not entirely: I just think it will make the performance curve of your
>> application look poor once it gets asked to process a 4GB CSV file, while
>> languages that do free unused memory will do a little better. :)
>
> Well it depends whether you actually need to keep all the data in memory.
> If the task is to "load" a data file, such as my dictionary, for subsequent
> random access, I want to keep it all in memory.  Perhaps storage can be saved
> by "interning" common strings or some fast compression scheme.  If you really
> need to keep 4GB of data in RAM for fast random access, you basically need more
> than 4GB of RAM.
>
> Stream processing is a different thing.  Nicely written C performs much better
> than any interpreted language or bytecode/JIT language for both stream
> processing and loading.  I don't know of any compiled language that performs
> better than well-written C for many cases, because C is close to the metal, and
> allows the programmer to do things their own way.
>
> For example with stream processing, I commonly use a single resizable buffer
> for each line, so there is no need to free anything inside the loop.  You said
> that I am cheating by not freeing anything, but if I write my C (or brace) code
> well, there is no need to malloc or free anything except perhaps a single line
> buffer, and maybe realloc it a couple times if it started off too small.  Going
> nuts with malloc and free for every field in a file is almost always a bad way
> to program in C.  In Java, perl, python, etc., you really have little or no
> choice about that, there's no nice way to do it as efficiently as in C, you
> need to use an object / buffer for each string.  So I suspect your Java / perl
> program will run maybe 10 times slower than a well-written C, brace or go
> program.

So, are you going to provide us with a well-written Brace program that 
performs the same task as the Go/Scala/Perl implementations then?

It's very simple, should only take you 5 minutes if you're familiar with 
your language.

Only requirements are:
You perform "good" practice as far as closing files and freeing memory 
goes - by which I mean that you do eventually close the file and free 
memory, and your program should not grow to infinite memory size if the 
input file keeps growing.
So you can use a static buffer for every row of the CSV if you like.

Given the same input file, your output should match the Perl output.


Good luck,
Toby


More information about the Melbourne-pm mailing list