[Omaha.pm] Suggested XML modules...

Christopher Cashell topher-pm at zyp.org
Sun Nov 30 14:27:56 PST 2008


On Sun, Nov 30, 2008 at 3:26 PM, Dan Linder <dan at linder.org> wrote:
> I was looking at it a bit because our XML files have the potential to get
> quite large (>50GB dumps).  On the other hand, the day-to-day files should
> stay quite manageable (between 100K to 10M), so XML::Twig's ability to
> process only a portion of an XML file might be overkill.

Just a quick note of warning, it can be very surprising how much RAM
is required for processing XML documents as they get large.  Loading
the entire document into memory has a way of balooning really fast.
We ran into some issues with that on a project at my previous
employer.

As noted in the Perl XML FAQ:

"The memory requirements of a tree based parser can be surprisingly
high. Because each node in the tree needs to keep track of links to
ancestor, sibling and child nodes, the memory required to build a tree
can easily reach 10-30 times the size of the source document. You
probably don't need to worry about that though unless your documents
are multi-megabytes (or you're running on lower spec hardware)."

We had a couple of XML files that were under 10MB and they were
causing memory usage of nearly 500MB in the initial version of the
processing application.

> Dan

-- 
Christopher


More information about the Omaha-pm mailing list