[Melbourne-pm] and the winner is C! (so far anyway, no big surprise)

Toby Corkindale toby.corkindale at strategicdata.com.au
Fri May 21 01:38:24 PDT 2010


On 21/05/10 18:23, Sam Watkins wrote:
 > Here's the leaderboard of CSV readers in various languages, compared 
to C,
 > for a 100,000 line CSV file:
 >
 > 	C           1.00
 > 	brace       1.16
 > 	perl XS    11.33
 > 	(bad) go   17.50
 > 	scala      19.32
 > 	perl       62.51


Umm, I think you're comparing the wrong results there; for the 100k 
file, Perl only takes 1.1 seconds for me.
(However the C version only takes 0.11 seconds on that file!)

For the "big" 10m row file, the C version takes 7.90 seconds on my 
testbed, which definitely takes it into the lead, by far! (The next 
fastest contender is over a minute, and perl takes 108 seconds)


> I should fix the C / brace version to use fread not fgets for better
> correctness (allowing \n in quoted fields) and maybe to go a little faster.
>
> The printf output, even going to /dev/null, took more than half the time for
> the C code; so if we are testing just CSV reading, C is actually "more faster"
> than my figures indicate.

*nods*
Someone else recently pointed this out -- the printf() is consuming the 
majority of the time, apparently due to it's flushing.


I'm going to modify the tests and give them another shot once I've 
eliminated the buffer flushing..


More information about the Melbourne-pm mailing list