[Neworleans-pm] need a little help with a RegEx

Tue Jan 31 07:01:57 PST 2006

On 1/31/06, Alex Walker <alex at houseofwalker.net> wrote:

> Good morning all.

Good morning, Alex.

> This should be easy, but somehow I'm missing something. I'm inserting tags in a set of HTML files based on a couple of criteria. What I want to do is find every instance of "<p class=imagecenter>" containing an image with a "usemap" tag and insert div tags around the paragraph. The HTML is a little ugly since it was generated by a WYSIWYG editor (sorry, it was like that when I got here), although the files are all relatively short I can assume that all paragraphs contain a closing "</p>". The easiest thing seemed to be to just slurp it and look for everything between a  "<p class=imagecenter>" and "</p>", but it's not working.
>
> Here's the problem... everything works like a champ when there is a "usemap" tag, but on instances where that tag is not there, the script still seems to be picking up the initial "<p class" tag and then just never closing it. I've attached a sample HTML file and a trimmed down version of the script. You'll see that the first image works fine but the second (which should just be skipped) causes all kinds of problems. I'm sure there's just some dumb thing that I'm missing here, but I really can't see it. Any help would be much appreciated.

I think the problem is that your regex is slurping up too much between
the opening of the p element and the start of the img element.  I
don't what all the variations are that you're searching for, but I
found that this does what you want, I think:

  $fileString =~ s#(<p
class=imagecenter>[^>]*<img[^>]+?usemap[^>]+?></p>)#\n<div
id="mapimage" align="center">$1\n</div>#igs;

Basically, I just made the wildcard sections a little less wild.

I hope that helps.

Take care,

Dave