[Melbourne-pm] Still performing well..

Sam Watkins sam at nipl.net
Wed May 19 19:36:15 PDT 2010


On Wed, May 19, 2010 at 03:20:28PM +1000, Daniel Pittman wrote:
> Well, not entirely: I just think it will make the performance curve of your
> application look poor once it gets asked to process a 4GB CSV file, while
> languages that do free unused memory will do a little better. :)

Well it depends whether you actually need to keep all the data in memory.
If the task is to "load" a data file, such as my dictionary, for subsequent
random access, I want to keep it all in memory.  Perhaps storage can be saved
by "interning" common strings or some fast compression scheme.  If you really
need to keep 4GB of data in RAM for fast random access, you basically need more
than 4GB of RAM.

Stream processing is a different thing.  Nicely written C performs much better
than any interpreted language or bytecode/JIT language for both stream
processing and loading.  I don't know of any compiled language that performs
better than well-written C for many cases, because C is close to the metal, and
allows the programmer to do things their own way.

For example with stream processing, I commonly use a single resizable buffer
for each line, so there is no need to free anything inside the loop.  You said
that I am cheating by not freeing anything, but if I write my C (or brace) code
well, there is no need to malloc or free anything except perhaps a single line
buffer, and maybe realloc it a couple times if it started off too small.  Going
nuts with malloc and free for every field in a file is almost always a bad way
to program in C.  In Java, perl, python, etc., you really have little or no
choice about that, there's no nice way to do it as efficiently as in C, you
need to use an object / buffer for each string.  So I suspect your Java / perl
program will run maybe 10 times slower than a well-written C, brace or go
program.

Sam


More information about the Melbourne-pm mailing list