OC-PM: SAX ponderings

Kip Hampton kip at web.oakley.com
Mon Dec 9 20:23:03 CST 2002


Wilson, Douglas wrote:
> First let me say thanks again Kip for the presentation last night!
> I've now at least a start on what SAX is about...

Thanks.

> 
> I was thinking about the theoretical problem Ryan
> mentioned last night, and how it might be handled with SAX.
> The problem as I recall was, say you have a document like this:
> ...
> <x>
>   <a>...</a>
>   <b>...</b>
>   <c>...</c>
> </x>
> ...
> 
> And you want to randomly eliminate one of the children of x (or maybe
> you want to randomly pick one to keep and eliminate the rest), and
> you don't know in advance how many children x has.
> 
> Am I wrong in thinking that after you come across the x start_element
> event, you would then have to pass events on to a buffer (lets say you
> want to randomly eliminate a child), then when you get to the x end_element
> event you'd have to then pass the buffer through a filter which would
> eliminate the
> unlucky nth child.

Almost...

You're right in that you'd need to buffer the events between the calls 
to start_element() and end_element() for the 'x' element, but a filter 
is not really needed.

Consider the following (untested):

package Ryans::RandomChildFilter;
use XML::SAX::Base;
use vars qw( @ISA $e_count );
@ISA = qw( XML::SAX::Base );

$e_count = 0;

#init in start_doc
sub start_document {
     my $self = shift;

     $self->{in_buffer}     = undef;
     $self->{event_buffer}  = [];

     $self->SUPER::start_document( @_ );
}

sub start_element {
     my $self = shift;
     my $e = shift;

     if ( defined( $self->{in_buffer} ) ) {
         push @{$self->{event_buffer}->[$e_count]},
              ['start_element', $e];
     }
     else {
         $self->SUPER::start_element( $e );
     }
}

sub characters {
     my $self = shift;
     my $chars = shift;

     if ( defined( $self->{in_buffer} ) ) {
         push @{$self->{event_buffer}->[$e_count]},
              ['characters', $chars];
     }
     else {
         $self->SUPER::characters( $chars );
     }
}

sub end_element {
     my $self = shift;
     my $e = shift;

     if ( defined( $self->{in_buffer} ) ) {

         if ( $e->{LocalName} eq 'x' ) {

             # pick a random element from the stack
             my @buffer = @{$self->{event_buffer}};
             my $selected_index = int rand($#buffer);

             # forward the element's events from the buffer
             foreach my $event ( @{$buffer[$selected_index]} ) {
                 my ( $method, $data ) = @{$event};
                 $self->SUPER::$method( $data );
             }

             # reset
             $self->{event_buffer} = [];
             $self->{in_buffer} = undef;
             $e_count = 0;
         }
         else {
             push @{$self->{event_buffer}->[$e_count]},
                  ['end_element', $e];
             $e_count++;
         }
     }
     else {
         $self->SUPER::start_element( $e );
     }
}

1;

Obvously, there may be cleaner ways to skin the same cat, but, does this 
help at all ?

-kip




More information about the Oc-pm mailing list