[Melbourne-pm] Still performing well..

Tue May 18 19:01:54 PDT 2010

On Tue, May 18, 2010 at 04:52:40PM +1000, Daniel Pittman wrote:
> Toby Corkindale <toby.corkindale at strategicdata.com.au> writes:
> > On 18/05/10 15:28, Daniel Pittman wrote:
> >> Toby Corkindale<toby.corkindale at strategicdata.com.au>  writes:
> >>
> >>> After the last Perlmongers meeting I was curious to benchmark Perl vs Go vs
> >>> Scala in more than just a trivial case.
> >>>
> >>> I setup a test to read in a large CSV file, performing some minor numeric and
> >>> text manipulation upon each row, and outputing the results.
> >>
> >> When you say "large", do you mean 1MB, 10MB, 1GB, 1TB?
> >
> > Uh, 10,000 lines + header.

How many megabytes is that, I mean a line could have 1 field or 10,000 fields,
and a field could be 1 bytes or 1MB!

I have a C (actually brace) program which loads and indexes a 22,000 line 630KB
file, a Tagalog - English dictionary, in 0.020 seconds on my wimpy little VPS.
It's not TSV it's records of key: value pairs (like mail headers) but if
anything it's harder to parse than CSV because I have to look up the keys.

Almost half of the 0.020 seconds is the time to fork and exec!
("hello world" takes ~ 0.008s) and it's not a very efficent implementation. 
Even in pure perl I can do something similar in under 0.1 seconds.  So I'm
guessing your CSV file is a lot bigger than that?

I tried my program with a larger file, it can load and index a 63MB file in 1.5
seconds (when Linux has cached the file).

Sam