[Kc] Perl Question: XML::Twig module

Daryl Fallin darylvf at gmail.com
Mon Jun 28 08:12:00 PDT 2010


All -

Figured out the problem.  Sterling Hanenkamp got me going in the right
direction.

Anyway... I was using an abstract example to ask my question, so here is an
explanation and my actual code.

I am working with the Qualys API and I wanted to pull all scan data back
from Qualys so that I can store and mashup the data against other data
sources.

The DTD for the Qualys xml is:  https://qualysapi.qualys.com/scan-1.dtd
(This will give you the structure of the XML file)

Here is the basic code that I ended up with.  This works on the xml file
after being retrieved from Qualys.


*************************************************
#!/usr/bin/perl -w

# Indentation style: 1 tab = 4 spaces


require XML::Twig;

sub info {
        my ($xml, $info) = @_;
        my $elt = $info;
        if ($elt->is_elt =~ m/(VULN|SERVICE|INFO|PRACTICE)/) {
            printf "VALUE: %s \n",
$elt->parent->parent->parent->att("value");
            printf "ENT: %s \n", $elt->is_elt;
        }

        if ($elt->is_elt =~ m/(OS|NETBIOS_HOSTNAME)/) {
                printf "VALUE: %s \n", $elt->parent->att("value");
                printf "ENT: %s \n", $elt->is_elt;
                printf "%s\n", $elt->text;

        }
        while ($elt= $elt->next_elt($info) )
        {
                my $localname = $elt->local_name;
                if ($localname ne '#CDATA' && $localname ne '#PCDATA') {
                    printf "%s: ", $localname;
                    printf "%s\n", $elt->text;
                }
        }
        printf "\n\n";
}

#===================================================
#Main program section


$xml = new XML::Twig(
        TwigHandlers => {
                SERVICE             => \&info,
                VULN                => \&info,
                OS                  => \&info,
                NETBIOS_HOSTNAME    => \&info,
                INFO                => \&info,
                PRACTICE            => \&info,
                HEADER              => \&info,
                #_all_       => \&info,                         # not using
_all_ to ignore the toplevel SCAN tag
        },
        error_context => 1,

);

# Parse the XML
$xml->parsefile('sample.xml');

******************************************************************


On Fri, Jun 25, 2010 at 7:31 PM, Daryl Fallin <darylvf at gmail.com> wrote:

> Hi All ....
>
> I have been trying to work with XML::Twig lately to parse an xml file.
>
> I just want to dump every element/Tag of the xml file.  But my while loops
> seems to be doing something weird or its the way that XML::Twig is working,
> not sure, but I get duplicate information from the original XML file.  Its
> like it is running part of the while loop twice.
>
> I know there are other modules that I could use but I am using XML::Twig
> for other parts of what will be a larger program and I want the chunking
> that XML:Twig allows.
>
> Any help would be greatly appreciated.
>
> Here is my sample code:
>
> #!/usr/bin/perl -w
>
> require XML::Twig;
>
> sub info {
>         my ($xml, $info) = @_;
>         my $elt = $info;
>         while ($elt= $elt->next_elt($info) )
>         {
>                 $elt->set_remove_cdata(1);
>                 $elt->set_pretty_print("record");  # print one field per
> line
>                 printf "%s\n", $elt->sprint;
>         }
> }
>
> $xml = new XML::Twig(
>         TwigHandlers => {
>                 XML_DIZ_INFO       => \&info,
>         }
> );
>
> # Parse the XML
> $xml->parsefile('sample.xml');
>
> ************************
>
> sample.xml
> -----------------
> <?xml version="1.0" ?>
> <XML_DIZ_INFO>
>         <MASTER_PAD_VERSION_INFO>
>                 <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
>                 <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
>                 <MASTER_PAD_INFO>information would go here
> </MASTER_PAD_INFO>
>         </MASTER_PAD_VERSION_INFO>
>         <Company_Info>
>                 <Company_Name>Moyea Software Co., Ltd.</Company_Name>
>                 <Country>China</Country>
>                 <Company_WebSite_URL>http://www.whatever.com
> </Company_WebSite_URL>
>                 <Contact_Info>
>                         <Author_First_Name>Bob</Author_First_Name>
>                         <Author_Last_Name>King</Author_Last_Name>
>                         <Author_Email>product at moyea.com</Author_Email>
>                 </Contact_Info>
>         </Company_Info>
> </XML_DIZ_INFO>
>
> ============================================
> The following is the output I get.  After the closing </Company_Info> it
> should stop.
> ============================================
>
>   <MASTER_PAD_VERSION_INFO>
>     <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
>     <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
>     <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
>   </MASTER_PAD_VERSION_INFO>
>
>     <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
> 1.0
>
>     <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
> Master Editor here
>
>     <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
> information would go here
>
>   <Company_Info>
>     <Company_Name>Moyea Software Co., Ltd.</Company_Name>
>     <Country>China</Country>
>     <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
>     <Contact_Info>
>       <Author_First_Name>Bob</Author_First_Name>
>       <Author_Last_Name>King</Author_Last_Name>
>       <Author_Email>product at moyea.com</Author_Email>
>     </Contact_Info>
>   </Company_Info>
>
>     <Company_Name>Moyea Software Co., Ltd.</Company_Name>
> Moyea Software Co., Ltd.
>
>     <Country>China</Country>
> China
>
>     <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
> http://www.whatever.com
>
>     <Contact_Info>
>       <Author_First_Name>Bob</Author_First_Name>
>       <Author_Last_Name>King</Author_Last_Name>
>       <Author_Email>product at moyea.com</Author_Email>
>     </Contact_Info>
>
>       <Author_First_Name>Bob</Author_First_Name>
> Bob
>
>       <Author_Last_Name>King</Author_Last_Name>
> King
>
>       <Author_Email>product at moyea.com</Author_Email>
> product at moyea.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/kc/attachments/20100628/a05a0c60/attachment.html>


More information about the kc mailing list