[Pdx-pm] meeting tonight: XML with Xtra X
Seven till Seven
enobacon at gmail.com
Wed Mar 10 08:40:28 PST 2010
see http://pdx.pm.org/kwiki/
Wed. March 10th, 6:53pm at FreeGeek -- 1731 SE 10th Ave.
How to learn to parse huge XML documents by doing it wrong for 5 years
Speaker: Tyler Riddle
When XML documents can't fit into memory the vast majority of solutions
available on CPAN are no longer available to you; when the XML
documents are so large they take up to 16 hours to process with the
standard tools for handling large documents your hands are tied even
more. Tyler will cover his learning experiences creating the
Parse::MediaWikiDump and MediaWiki::DumpFile modules which are made to
handle the 24 gigabyte English Wikipedia dump files in a reasonable
time frame.
1) Real world benchmarks of C and perl libraries used to process huge
XML documents.
2) The dirty little secret about XS and what it means for you in this
context.
3) The evolution of the implementation of a nice interface around event
oriented (SAX style) XML parsing.
4) Why XML::LibXML::Reader and XML::CompactTree are your friends and
how to tame them.
As always, the meeting will be followed by social hour at the LuckyLab.
--
http://pdx.pm.org
More information about the Pdx-pm-list
mailing list