[boulder.pm] FW: text extract

Justin Crawford Justin.Crawford at cusys.edu
Thu Jul 26 11:24:54 CDT 2001


Thanks for all the suggestions, everyone.  I knew the problem had probably
been considered before by better coders than me.  I have a script to get the
chunks I'm after now.

Side note (new-b ?):

Jim, I couldn't get your solution to work.  It looks like fun though.  These
are the 2 lines that lose me:

    my $start = $-[0];   # @- is the beginning offsets of the captures
    my $end = $+[0] - 1; # @+ and this is the end

I just can't figure out what's going on.  Output of the script is like:

Looking for 'NEEDLE' in digest '-xxxxxxxxxxx;
-xxxxxxNxxxx;-xxxxxxNxxxx;-xxxxxxxxxx;-xN;'
Use of uninitialized value at fileR.pl line 23.
Use of uninitialized value at fileR.pl line 24.
Use of uninitialized value at fileR.pl line 24.

@+ isn't initialized.  I've never seen a regular array named like that
before, so I guessed that it's a special variable (along with @-).  But
neither's listed in my perl books, so maybe they're just regular arrays that
I need to fill up?  What's their story, where do they come from, what should
they be initialized to in this context?

Thanks again,

Justin

-----------
use strict;
use warnings;

my $needle = shift;
my @data = <>;

# Construct digest of data
my $digest;
foreach my $row (@data) {
    if ($row =~ /^-\s*$/) { $digest .= '-'; }
    elsif ($row =~ /^;\s*$/) { $digest .= ';'; }
    elsif ($row =~ /^\s*$/) { $digest .= ' '; }
    elsif ($row =~ /$needle/o) { $digest .= 'N'; }
    else { $digest .= 'x'; }
}

# Now look for our needle, and any data surrounding it
print STDERR "Looking for '$needle' in digest '$digest'\n";
if ($digest =~ /(-x*Nx*;)/) { # modify for more complex needles
    my $start = $-[0];   # @- is the beginning offsets of the captures
    my $end = $+[0] - 1; # @+ and this is the end
    foreach my $i ($start .. $end) {
	print $data[$i];
    }
}
else {
    die "Needle not found";
}



More information about the Boulder-pm mailing list