[Pdx-pm] Wikipedia dump file XML shootout

Tyler Riddle triddle at gmail.com
Tue Dec 8 07:08:29 PST 2009

> I'm fairly certain ->add() is what does the bulk of the processing,
> otherwise trying to stuff 22GB into memory for later digestion would
> prove... problematic on most machines. Unless I am reading your code
> incorrectly, that is done in the middle of the timing vector.

It does consume a fair amount of processing time, you are correct.
However the output of times() accounts for the current process and
child processes. When I open() the test I get a second child process
(now the third total process benchmark.pl -> fork() -> open("$test
|")). I only use the cuser and csystem values from times() in the
forked process. This could be more explicit in my code, I'll clean it
up so it's more obvious. Thank you for your feedback. :-)

> If you find the time to get this working, alert me (or the list) of your
> discoveries, I am genuinely curious.

I'll find some time to pluck away at it eventually but I'm really
hoping some C angel will descend from above and bless the code with
their magic; I'm just a level 1 C monger.

If you wish to make an apple pie from scratch you must first invent
the universe. -- Carl Sagan

More information about the Pdx-pm-list mailing list