[Melbourne-pm] Still performing well..

Toby Corkindale toby.corkindale at strategicdata.com.au
Tue May 18 19:11:52 PDT 2010


On 19/05/10 12:01, Sam Watkins wrote:
> On Tue, May 18, 2010 at 04:52:40PM +1000, Daniel Pittman wrote:
>> Toby Corkindale<toby.corkindale at strategicdata.com.au>  writes:
>>> On 18/05/10 15:28, Daniel Pittman wrote:
>>>> Toby Corkindale<toby.corkindale at strategicdata.com.au>   writes:
>>>>
>>>>> After the last Perlmongers meeting I was curious to benchmark Perl vs Go vs
>>>>> Scala in more than just a trivial case.
>>>>>
>>>>> I setup a test to read in a large CSV file, performing some minor numeric and
>>>>> text manipulation upon each row, and outputing the results.
>>>>
>>>> When you say "large", do you mean 1MB, 10MB, 1GB, 1TB?
>>>
>>> Uh, 10,000 lines + header.
>
> How many megabytes is that, I mean a line could have 1 field or 10,000 fields,
> and a field could be 1 bytes or 1MB!

Actually, I was wrong, I was using 100,000 lines per "small" file, 
1,000,000 lines per "medium" file and 10m lines per "big" file.
The big file is 260 MByte.

I was also using a poor CSV implementation in the Scala version, and 
I've now replaced it with a Java CSV engine I found. (Doesn't seem to be 
any pure-Scala implementations yet.)

The updated times are:
small file:
Perl - 1.089 secs
Scala - 1.857 secs
Go - 1.682 secs

big file:
Perl - 111.3 s
Scala - 89.05 s
Go - 154.3 s


> I have a C (actually brace) program which loads and indexes a 22,000 line 630KB
> file, a Tagalog - English dictionary, in 0.020 seconds on my wimpy little VPS.
> It's not TSV it's records of key: value pairs (like mail headers) but if
> anything it's harder to parse than CSV because I have to look up the keys.
>
> Almost half of the 0.020 seconds is the time to fork and exec!
> ("hello world" takes ~ 0.008s) and it's not a very efficent implementation.
> Even in pure perl I can do something similar in under 0.1 seconds.  So I'm
> guessing your CSV file is a lot bigger than that?

Would you like to submit a brace version of my simplistic benchmark? 
Then we can compare apples with apples.
It'll probably eat the others alive; so would a C implementation, I'm 
guessing - but then again, maybe not - the Scala version is looking 
pretty snappy now.


More information about the Melbourne-pm mailing list