[Pdx-pm] Any interest in a talk on how to parse huge XML documents?

Tyler Riddle triddle at gmail.com
Sat Jan 16 11:21:39 PST 2010


I've been pondering sharing my experiences dealing with parsing the
MediaWiki dump files which for the English Wikipedia presently sits
around 40 gigabytes. Such XML poses significant issues and I've
learned a lot by creating the Parse::MediaWikiDump and
MediaWiki::DumpFile modules on CPAN. While I personally find this
topic to be quite interesting XML in general seems to put people to
sleep best case and into permanent comas with their brains leaking out
of their ears in the worst case. As such I'm going to let the list
decide if it's worth me putting into the time and effort to make a
presentation. Feel free to discuss, vote, or use what ever democratic
or anarchistic decision methods are suitable to figure this out.

The name of the talk would be "How to learn to work with XML by doing
it wrong for 5 years" with the caveat that I'm not sure I'm doing it
right yet. Audience participation would be welcome and encouraged to
bring any other insight possible.

Tyler Riddle

-- 
If you wish to make an apple pie from scratch you must first invent
the universe. -- Carl Sagan


More information about the Pdx-pm-list mailing list