[Pdx-pm] meeting tonight: XML with Xtra X

Seven till Seven enobacon at gmail.com
Wed Mar 10 08:40:28 PST 2010


see http://pdx.pm.org/kwiki/

  Wed. March 10th, 6:53pm at FreeGeek -- 1731 SE 10th Ave.

How to learn to parse huge XML documents by doing it wrong for 5 years
Speaker: Tyler Riddle 

 When XML documents can't fit into memory the vast majority of solutions 
available on CPAN are no longer available to you; when the XML 
documents are so large they take up to 16 hours to process with the 
standard tools for handling large documents your hands are tied even 
more. Tyler will cover his learning experiences creating the 
Parse::MediaWikiDump and MediaWiki::DumpFile modules which are made to 
handle the 24 gigabyte English Wikipedia dump files in a reasonable 
time frame.

 1) Real world benchmarks of C and perl libraries used to process huge
    XML documents. 

 2) The dirty little secret about XS and what it means for you in this
    context. 

 3) The evolution of the implementation of a nice interface around event
    oriented (SAX style) XML parsing. 

 4) Why XML::LibXML::Reader and XML::CompactTree are your friends and
    how to tame them.


As always, the meeting will be followed by social hour at the LuckyLab. 
-- 

        http://pdx.pm.org


More information about the Pdx-pm-list mailing list