APM: Re: Regular Expression Guru's anyone?

Wayne Walker wwalker at bybent.com
Thu Oct 10 19:29:33 CDT 2002


This will work (at least for my test data)


guru.pl:

#!/usr/bin/perl

$/ = undef; # unset record separator to read in entire file at once

use strict; # the only way to write perl :)

my ($data, $newdata, $text, $tag);

$data = <DATA>; # Read in all the lines following __DATA__

# Break the string into 3 pieces:
# text before a tag, tag, everything following the tag
# leading non < characters, < all non >chars up to next >, everything else.
while ( $data =~ /^([^<]*)(<[^>]*>)(.*)$/s)
{
    # Lazy man's way to grab 3 vars at a time :)
    ($text, $tag, $data) = ($1, $2, $3);
    # Fix the text
    $text =~ s/bird/Hawk/gs;  # Globally change, treat as a single line
    # Add the text and the tag to the $newdata string
    $newdata .= $text . $tag;
}
# take whatever is left when there are no more tags and fix it and
# append it to $newdata

$data =~ s/bird/Hawk/gs;
$newdata .= $text;

print $newdata;

__DATA__
this is some text about a bird, a bird is cool, here is a picture of a
bird <img src='bird.jpg'>
this is some text about a bird, a bird is cool, here is a picture of a
bird <img src='bird.jpg'>
this is some text about a bird, a bird is cool, here is a picture of a
bird <img src='bird.jpg'>
this is some text about a bird, a bird is cool, here is a picture of a
bird <img src='bird.jpg'>
this is some text about a bird, a bird is cool, here is a picture of a
bird <img src='bird.jpg'>

On Thu, Oct 10, 2002 at 03:49:13PM -0500, David Lyons wrote:
> Here is what I am trying to do, I need to match text that is in an html 
> document but specifically not inside an HTML tag, ie:
> 
> matching the word bird:
> 
> this is some text about a bird, a bird is cool, here is a picture of a 
> bird <img src='bird.jpg'>
> 
> would hit on the two instances of "bird" but not on the one in the img 
> tag (or any other HTML tag for that matter).
> 
> Thanks,
> D
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: linux-unsubscribe at ctlug.org
> For additional commands, e-mail: linux-help at ctlug.org
> ---------------------------------------------------------------------
> Visit our website at <http://www.ctlug.org>.

-- 

Wayne Walker



More information about the Austin mailing list