[tpm] Good examples?

Stuart Watt stuart at morungos.com
Fri Nov 8 06:49:46 PST 2013


I never use XML::SAX in practice. It’s too much work. 

I usually use XML::LibXML::Reader, which allows more or less streaming access to elements. I can read in elements, even from STDIN, and then for a given element, I can turn it into a DOM tree and use XPath to get the bits I need. XML::LibXML is C underneath, so this is fast. It also handles files that are too big to fit in memory, which was how I ended up with this toolchain. It looks a bit like this:

my $reader = XML::LibXML::Reader->new(IO => $fh);
while($reader->read() && $reader->name() ne 'mutation') {};
    do {
        if ($reader->name() eq 'mutation') {
            my $root = $reader->copyCurrentNode(1);
            my $name = $root->findvalue(qq{/Entrezgene-Set/Entrezgene/Entrezgene_gene/Gene-ref/Gene-ref_locus});
            … and similar extractions
        }
    } while($reader->nextSibling());

This does assume that my file contains lots of sibling mutation elements, but nothing about what each contains.

The advantage of this is that actually using SAX to get at /mutation/Entrezgene-Set/Entrezgene/Entrezgene_gene/Gene-ref/Gene-ref_locus nested elements would be extremely painful. 

Oh, and $fh can easily be \*STDIN if you need. 

Seriously, XML::LibXML is a great set of tools for XML. Using SAX directly is going to be miserable. Also, XML::LibXML has decent API documentation, although the examples might leave something to be desired. 

All the best
Stuart



On Nov 8, 2013, at 9:30 AM, arocker at Vex.Net wrote:

> 
> Does anyone know of a good explanation/example of how to use XML::SAX
> anywhere online?
> 
> I've looked at the CPAN documentation, and other usual sources, but they
> have some ambiguity about what are standard method names, features, &c.,
> and what are simply examples of user-written resources.
> 
> The problem that started this train of thought is absurdly simple;
> extracting 3 fields per record from a STDIN stream of records containing 5
> or 6 embedded in XML tags.
> 
> A simple program, basically 3 regexes and a print, does the job, but
> attracted a storm of online criticism about parsing XML with regexes.
> I've been trying to write the equivalent using XML::SAX, but seem to be
> missing something. (For one thing, all the examples assume reading from a
> named file, not STDIN.)
> 
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm

--
Stuart Watt
stuart at morungos.com / twitter.com/morungos


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20131108/b126fa04/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20131108/b126fa04/attachment.bin>


More information about the toronto-pm mailing list