Any interest in Perl and XML

Bob La Quey robertl1 at home.com
Fri May 19 17:18:59 CDT 2000


~sdpm~
At 02:37 PM 5/19/00 -0700, you wrote:
>~sdpm~
>Dear Bob,
>
>> I have some code and would be interested in seeing what others
>> are up to as well.
>
>  Check out:
>http://www-4.ibm.com/software/developer/library/xml-perl/
>
>  and
>http://www.velocigen.com/~tdarugar/talks/
>
>  For some of my stuff on XML. We use XML and Perl extensively
>here, so I'm definitely interested in the topic.
>
>Best,
>
>Parand Tony Darugar			tdarugar at velocigen.com

Kool,

Here is a little ditty I wrote for xml.com

http://www.xml.com/pub/1999/11/sml/index.html

But why don't we post a little code up here so we can
have some fun and games. After all there is "more than
one way to do it" so I suspect we can have some amusing
arguments if not downright flamefests and religious wars.

I'll kick it off. This is some code I presented at the Linux 
Programmers Study Group (LPSG) about a year ago. The LPSG is
an offshoot of the Kernel-Panic Linux Users Group (KPLUG). 
LPSG meets twice a month here in San Diego at the San Diego 
County of Education facilty. See 
http://www.kernel-panic.org/lpsg/ for details. 
We also have an active mailing list.

Here is some Perl code that uses the XML::Parser 
module to do (surprise) parse XML. 


#!/usr/bin/perl

use XML::Parser;
my $source_filename=shift;
my $count=0;

###################################################################################
# Supporting data structures and subroutines for parser
###################################################################################
#set up specifics of my xml handlers
%tags = ( ); 

# start_handler pushs tag on tag_stack,  end_handler pops tag off tag_stack
@tag_stack=(); 
$sp=1;

#handlers for xml parser events 
sub init_handler() {
	my($p) = @_; 
	print("Starting XML parsing\n");
}
sub final_handler() {
	my($p) = @_; 
	print("End of XML parsing\n");
	10; 
}
sub start_handler() {
	my($p, $tag) = @_; 
	$tag_stack[++$sp]=$tag;
	indent();
	print("<",$tag,">\n");
}
sub end_handler() {
	my($p, $tag) = @_;
	indent(); 
	print("</",$tag,">\n");
	$sp--;
}
sub char_handler() {
	my($p, $el) = @_;
	my $current_tag=$tag_stack[$sp];
	indent(); 
	print("char_handler, current tag <",$current_tag, "> data element = \"", $el,"\"\n");  

} 

#default_handler should never be invoked 
sub default_handler() {
	my($p, $el) = @_;
	if($el =~ /\w/) { # only print lines containing something other than whitespace 
		print("default_handler ", $p, " for ",     $el," ord = ", ord($el), "\n"); 
	} 
}

sub spaces() {
	my ($deep) = @_;
	$deep = 4*$deep;
	$spaces = ' ' x $deep;
	print($spaces);
}

sub indent() {
	&spaces($sp-1);
} 

###################################################################################
# The parser 
###################################################################################

my $parser = new XML::Parser(ErrorContext => 2); 

$parser->setHandlers( Init 	=> \&init_handler, 
			 Start => \&start_handler, 
			 Char 	=> \&char_handler, 
			 End 	=> \&end_handler, 
			 Final => \&final_handler); 

$parser->parsefile($source_filename); 

###################################################################################
Discussion:

Examine the code the last few lines of code. The program
	1) Instantiates an instance of the XML parser 
		my $parser = new ...
	2) Tells the parser what hanndlers to use when 
		the parser generates an event.
		$parser->setHandlers( ...
	3) Does the actual parsing
		$parser->parsefile($source_filename); 

perldoc XML::Parser lists among other things the Events
(see HANDLERS) that the parser generates. You may associate 
any subroutine you want as the handler of an event. 

Some important Events are: 
	1) Init 
		generated just before the parsing starts
	2) Final 
		generated just after succesfull parsing stops.
		parse returns what this returns.
	3) Start
		generated when an XML start tag is recognized
	4) End 
		generated when an XML end tag is recognized
	5) Char
		generated when a non-mark up string is recognized

There are other Events but these will do for the purposes of this
tutorial. 

The subroutines defined here as event handlers are just 
intended to print out the tags and node contents as a tutorial 
exercise. Actual applications would use these handlers to do 
something more substantial, e.g. generate HTML, or put data 
into a database, etc. 

Next we will look at using this Perl script to actually
parse some simple XML. (See next post) 









~sdpm~

The posting address is: san-diego-pm-list at hfb.pm.org

List requests should be sent to: majordomo at hfb.pm.org

If you ever want to remove yourself from this mailing list,
you can send mail to <majordomo at happyfunball.pm.org> with the following
command in the body of your email message:

    unsubscribe san-diego-pm-list

If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-san-diego-pm-list at happyfunball.pm.org> .
This is the general rule for most mailing lists when you need
to contact a human.




More information about the San-Diego-pm mailing list