[Melbourne-pm] and the winner is C! (so far anyway, no big surprise)

Toby Corkindale toby.corkindale at strategicdata.com.au
Fri May 21 02:21:16 PDT 2010


On 21/05/10 18:54, Sam Watkins wrote:
> On Fri, May 21, 2010 at 06:38:24PM +1000, Toby Corkindale wrote:
>> On 21/05/10 18:23, Sam Watkins wrote:
>>> Here's the leaderboard of CSV readers in various languages, compared
>> to C,
>>> for a 100,000 line CSV file:
>>>
>>> 	C           1.00
>>> 	brace       1.16
>>> 	perl XS    11.33
>>> 	(bad) go   17.50
>>> 	scala      19.32
>>> 	perl       62.51
>>
>>
>> Umm, I think you're comparing the wrong results there; for the 100k
>> file, Perl only takes 1.1 seconds for me.
>> (However the C version only takes 0.11 seconds on that file!)
>
> Yes, those figures were relative to the C version, that's why C is 1.00.
>
> 1.1 / 0.11 = 10 means perl XS gets a score of about 10 on your machine
> - it's 10 times slower than C.
>
> By the way, I compiled it something like this:
>
> 	gcc -pedantic -std=gnu99 -Wall -Wextra -O2 -o read-c read.c
>
>> For the "big" 10m row file, the C version takes 7.90 seconds on my
>> testbed, which definitely takes it into the lead, by far! (The next
>> fastest contender is over a minute, and perl takes 108 seconds)
>
> Yes, the C version is about 10 times faster than the next fastest
> (excluding brace, which is essentially just C anyway).
>
>> I'm going to modify the tests and give them another shot once I've
>> eliminated the buffer flushing..
>
> ok, cool.  When I just commented out the printf I'm not sure how much other
> stuff the C compiler might have been discarding... "hey, he's not using this,
> no need to calculate it!"


Yeah!
I had a bit of a flamewar on another list about that..
A guy insisted that I should comment out the printf() in the scala 
version, and then compare it's performance.. Uh.. but surely the 
intelligent JVM will optimise out /heaps/ of stuff if I did that.. and 
there's no way to compare the output either.

The guy in question was yelling at me about how apparently I didn't know 
what I was doing, and was crap at testing performance. I didn't think it 
was that unreasonable to want all the tested programs to produce 
identical output! Sheesh.

curiously, I've made the Perl version use buffered IO and it seems to be 
(very very slightly) slower than the original, not faster! How odd.

I'm doing:
my $output = IO::Handle->new->fdopen(fileno(STDOUT), 'w');
$output->autoflush(0);
...
$outout->printf(...)
...
$output->close;

Does that seem right to everyone else?


Swapping to unbuffered I/O for Scala brought the time down from 89 to 67 
seconds on the biggest file.. and then using some performance 
improvements suggested by someone else, got it down to a tiny 26 
seconds! (And under 1 seconds for the small file)

Poor old Perl is looking very sorry for itself now, at 111 seconds :(


More information about the Melbourne-pm mailing list