[Omaha.pm] Suggested XML modules...

Rob Townley rob.townley at gmail.com
Wed May 13 10:23:21 PDT 2009


On Sun, Nov 30, 2008 at 5:27 PM, Christopher Cashell <topher-pm at zyp.org> wrote:
> On Sun, Nov 30, 2008 at 3:26 PM, Dan Linder <dan at linder.org> wrote:
>> I was looking at it a bit because our XML files have the potential to get
>> quite large (>50GB dumps).  On the other hand, the day-to-day files should
>> stay quite manageable (between 100K to 10M), so XML::Twig's ability to
>> process only a portion of an XML file might be overkill.
>
> Just a quick note of warning, it can be very surprising how much RAM
> is required for processing XML documents as they get large.  Loading
> the entire document into memory has a way of balooning really fast.
> We ran into some issues with that on a project at my previous
> employer.
>
> As noted in the Perl XML FAQ:
>
> "The memory requirements of a tree based parser can be surprisingly
> high. Because each node in the tree needs to keep track of links to
> ancestor, sibling and child nodes, the memory required to build a tree
> can easily reach 10-30 times the size of the source document. You
> probably don't need to worry about that though unless your documents
> are multi-megabytes (or you're running on lower spec hardware)."
>
> We had a couple of XML files that were under 10MB and they were
> causing memory usage of nearly 500MB in the initial version of the
> processing application.
>
>> Dan
>
> --
> Christopher
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm
>

tree vs serial processing is one thing, but just curious if you have
tried creating your own xml namespace to save memory?


More information about the Omaha-pm mailing list