[Melbourne-pm] brace / C CSV reader, Re: Still performing well..

Toby Corkindale toby.corkindale at strategicdata.com.au
Thu May 20 21:39:21 PDT 2010


On 21/05/10 14:05, Sam Watkins wrote:
> hi Toby,
>
>> So, are you going to provide us with a well-written Brace program that
>> performs the same task as the Go/Scala/Perl implementations then?
>
> Ok, I did write a brace / C program that does the same thing.  It's not
> terribly well-written but it doesn't allocate memory in its inner loop.  Brace
> translates directly to C, so it performs the same as an equivalent C program
> using the same techniques.
>
>> It's very simple, should only take you 5 minutes if you're familiar with
>> your language.
>
> lol, I wrote the CSV parser too so it took a little longer than that!
> it might be useful for me in future though.
>
>> Only requirements are:
>> You perform "good" practice as far as closing files and freeing memory
>> goes - by which I mean that you do eventually close the file and free
>> memory, and your program should not grow to infinite memory size if the
>> input file keeps growing.
>
> Ok, in this case I don't need to do that explicitly in brace.  Two of the block
> structures I used take care of it for me.  'F_in' opens and closes a file.
> 'eachline' reads all the lines from a file using a single buffer, which it
> frees at the end.
>
>> So you can use a static buffer for every row of the CSV if you like.
>
> 'eachline' does use a single resizable buffer.
>
>> Given the same input file, your output should match the Perl output.
>
> There were a few discrepancies originally because I had used float rather than
> double.  I changed it to use double (num), and now it does match exactly.
>
> Here are the times on my VPS:
>
> 	$ ./read.pl data.csv>/dev/null
> 	Code took 2.23198 wallclock secs ( 2.19 usr +  0.00 sys =  2.19 CPU) @  0.46/s (n=1)
>
> 	$ ./read.b data.csv>/dev/null
> 	Code took: 0.227998
>
> So my brace / C version is nearly 10 times faster than perl, even though the
> perl version uses Text::CSV_XS, which is written in C!  The difficulty for
> Text::CSV_XS is that it must allocate and use perl data structures.
>
> I don't doubt that Go would perform similarly to C or brace, if the CSV parser
> were written in the same way.
>
> I repeated the test with Text::CSV_PP, which uses pure perl rather than C / XS
> to parse the CSV, and might be a better comparison of the two langauges for the
> stated task.  I get this result:
>
> 	$ ./read-pp.pl data.csv>/dev/null
> 	Code took 12.1239 wallclock secs (11.86 usr +  0.01 sys = 11.87 CPU) @  0.08/s (n=1)
>
> I'm surprised that the pure-perl version is only about 5 times slower than the
> XS version, that's a real credit to perl.  However the brace / C version is 53
> times faster than the perl version.
>
> My brace program is not low-level programming.  It uses my library libb and is
> shorter and more high-level than the other programs shown.  It uses
> medium-level generic data structures: resizable string buffers, and vectors.
> If I avoided these and used low-level C arrays, it could run much faster.  But
> it's already twice as fast as GNU wc on the same data.  Actually that is
> because it's using UTF-8, in the C locale it's 6 times faster than my brace
> program.  If the brace program / my libs were optimized for speed I think it
> could approach that speed for parsing CSV.
>
> For myself, I prefer to use TSV format with C-style escapes \t \n \\.
> Unlike CSV, that works nicely with awk, cut and other unix tools.
> I'll add the CSV parser to my brace library so I can use it again.
>
> Sam Watkins
>
>
>
> Here is the main program in brace, as you can see it's about half the size of
> the perl version and 1/4 the size of the scala version.  Including the CSV
> parser it's about the same size as the scala version!  brace is work in
> progress, so this program could be improved as I improve the library.
> (Accessing objects and vector elements is a bit ugly at the moment.)
>
> I can provide the stand-alone C translation of the brace version if you'd like
> to test it.

.. that might be neccessary, as I just tried compiling Brace on my 
computer, and it fails to build:
.all.c: In function ‘gr__mitshm_fault_h’:
.all.c:16876: error: ‘X_ShmAttach’ undeclared (first use in this function)
.all.c:16876: error: (Each undeclared identifier is reported only once
.all.c:16876: error: for each function it appears in.)

That thing is declared in /usr/include/X11/extensions/shmproto.h so 
maybe it's just not being included in your code somewhere?



More information about the Melbourne-pm mailing list