[Kc] Perl Question: XML::Twig module
Daryl Fallin
darylvf at gmail.com
Mon Jun 28 08:12:00 PDT 2010
All -
Figured out the problem. Sterling Hanenkamp got me going in the right
direction.
Anyway... I was using an abstract example to ask my question, so here is an
explanation and my actual code.
I am working with the Qualys API and I wanted to pull all scan data back
from Qualys so that I can store and mashup the data against other data
sources.
The DTD for the Qualys xml is: https://qualysapi.qualys.com/scan-1.dtd
(This will give you the structure of the XML file)
Here is the basic code that I ended up with. This works on the xml file
after being retrieved from Qualys.
*************************************************
#!/usr/bin/perl -w
# Indentation style: 1 tab = 4 spaces
require XML::Twig;
sub info {
my ($xml, $info) = @_;
my $elt = $info;
if ($elt->is_elt =~ m/(VULN|SERVICE|INFO|PRACTICE)/) {
printf "VALUE: %s \n",
$elt->parent->parent->parent->att("value");
printf "ENT: %s \n", $elt->is_elt;
}
if ($elt->is_elt =~ m/(OS|NETBIOS_HOSTNAME)/) {
printf "VALUE: %s \n", $elt->parent->att("value");
printf "ENT: %s \n", $elt->is_elt;
printf "%s\n", $elt->text;
}
while ($elt= $elt->next_elt($info) )
{
my $localname = $elt->local_name;
if ($localname ne '#CDATA' && $localname ne '#PCDATA') {
printf "%s: ", $localname;
printf "%s\n", $elt->text;
}
}
printf "\n\n";
}
#===================================================
#Main program section
$xml = new XML::Twig(
TwigHandlers => {
SERVICE => \&info,
VULN => \&info,
OS => \&info,
NETBIOS_HOSTNAME => \&info,
INFO => \&info,
PRACTICE => \&info,
HEADER => \&info,
#_all_ => \&info, # not using
_all_ to ignore the toplevel SCAN tag
},
error_context => 1,
);
# Parse the XML
$xml->parsefile('sample.xml');
******************************************************************
On Fri, Jun 25, 2010 at 7:31 PM, Daryl Fallin <darylvf at gmail.com> wrote:
> Hi All ....
>
> I have been trying to work with XML::Twig lately to parse an xml file.
>
> I just want to dump every element/Tag of the xml file. But my while loops
> seems to be doing something weird or its the way that XML::Twig is working,
> not sure, but I get duplicate information from the original XML file. Its
> like it is running part of the while loop twice.
>
> I know there are other modules that I could use but I am using XML::Twig
> for other parts of what will be a larger program and I want the chunking
> that XML:Twig allows.
>
> Any help would be greatly appreciated.
>
> Here is my sample code:
>
> #!/usr/bin/perl -w
>
> require XML::Twig;
>
> sub info {
> my ($xml, $info) = @_;
> my $elt = $info;
> while ($elt= $elt->next_elt($info) )
> {
> $elt->set_remove_cdata(1);
> $elt->set_pretty_print("record"); # print one field per
> line
> printf "%s\n", $elt->sprint;
> }
> }
>
> $xml = new XML::Twig(
> TwigHandlers => {
> XML_DIZ_INFO => \&info,
> }
> );
>
> # Parse the XML
> $xml->parsefile('sample.xml');
>
> ************************
>
> sample.xml
> -----------------
> <?xml version="1.0" ?>
> <XML_DIZ_INFO>
> <MASTER_PAD_VERSION_INFO>
> <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
> <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
> <MASTER_PAD_INFO>information would go here
> </MASTER_PAD_INFO>
> </MASTER_PAD_VERSION_INFO>
> <Company_Info>
> <Company_Name>Moyea Software Co., Ltd.</Company_Name>
> <Country>China</Country>
> <Company_WebSite_URL>http://www.whatever.com
> </Company_WebSite_URL>
> <Contact_Info>
> <Author_First_Name>Bob</Author_First_Name>
> <Author_Last_Name>King</Author_Last_Name>
> <Author_Email>product at moyea.com</Author_Email>
> </Contact_Info>
> </Company_Info>
> </XML_DIZ_INFO>
>
> ============================================
> The following is the output I get. After the closing </Company_Info> it
> should stop.
> ============================================
>
> <MASTER_PAD_VERSION_INFO>
> <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
> <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
> <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
> </MASTER_PAD_VERSION_INFO>
>
> <MASTER_PAD_VERSION>1.0</MASTER_PAD_VERSION>
> 1.0
>
> <MASTER_PAD_EDITOR>Master Editor here</MASTER_PAD_EDITOR>
> Master Editor here
>
> <MASTER_PAD_INFO>information would go here </MASTER_PAD_INFO>
> information would go here
>
> <Company_Info>
> <Company_Name>Moyea Software Co., Ltd.</Company_Name>
> <Country>China</Country>
> <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
> <Contact_Info>
> <Author_First_Name>Bob</Author_First_Name>
> <Author_Last_Name>King</Author_Last_Name>
> <Author_Email>product at moyea.com</Author_Email>
> </Contact_Info>
> </Company_Info>
>
> <Company_Name>Moyea Software Co., Ltd.</Company_Name>
> Moyea Software Co., Ltd.
>
> <Country>China</Country>
> China
>
> <Company_WebSite_URL>http://www.whatever.com</Company_WebSite_URL>
> http://www.whatever.com
>
> <Contact_Info>
> <Author_First_Name>Bob</Author_First_Name>
> <Author_Last_Name>King</Author_Last_Name>
> <Author_Email>product at moyea.com</Author_Email>
> </Contact_Info>
>
> <Author_First_Name>Bob</Author_First_Name>
> Bob
>
> <Author_Last_Name>King</Author_Last_Name>
> King
>
> <Author_Email>product at moyea.com</Author_Email>
> product at moyea.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/kc/attachments/20100628/a05a0c60/attachment.html>
More information about the kc
mailing list