[Pdx-pm] XML with Xtra X -- March meeting in 2 weeks
Seven till Seven
enobacon at gmail.com
Wed Feb 24 18:39:57 PST 2010
How to learn to parse huge XML documents by doing it wrong for 5 years
Speaker: Tyler Riddle
When XML documents can't fit into memory the vast majority of solutions
available on CPAN are no longer available to you; when the XML
documents are so large they take up to 16 hours to process with the
standard tools for handling large documents your hands are tied even
more. Tyler will cover his learning experiences creating the
Parse::MediaWikiDump and MediaWiki::DumpFile modules which are made to
handle the 24 gigabyte English Wikipedia dump files in a reasonable
1) Real world benchmarks of C and perl libraries used to process huge
2) The dirty little secret about XS and what it means for you in this
3) The evolution of the implementation of a nice interface around event
oriented (SAX style) XML parsing.
4) Why XML::LibXML::Reader and XML::CompactTree are your friends and
how to tame them.
As always, the meeting will be followed by social hour at the LuckyLab.
More information about the Pdx-pm-list