[Pdx-pm] Wikipedia dump file XML shootout
erik at hollensbe.org
Sun Dec 6 12:03:53 PST 2009
On 12/6/2009 2:20 PM, Tyler Riddle wrote:
> Hello mongers,
> I've been doing research into which perl XML processing schemes are
> fastest as I design my replacement for Parse::MediaWikiDump. What I've
> wound up with is a nice benchmarking system and a large pile of test
> cases. I'm hoping there's some XML gurus or just weirdoes like me who
> like to try to make things go as fast as possible who are interested
> in either creating a benchmark for their favorite XML processor or
> seeing if my existing SAX handlers, etc, could be optimized more. If
> anyone is interested you can get the source via SVN from
This looks pretty cool.
One thing about your benchmark that I am concerned about: the md5
calculation appears to be affecting the time of the test. I'm not sure
if this is intentional or not, but I thought I'd mention it anyways.
I would be really surprised if the iksemel parser (once it's working,
assuming it's a smartly-written binding) doesn't blow everything else
out of the water; one of the iksemel features is that it throws away
giant featuresets in the name of performance.
More information about the Pdx-pm-list