LPM: A many monkeys question

Gregg Casillo gcasillo at ket.org
Fri Aug 13 12:58:30 CDT 1999


Here's my problem: I want to lowercase all internal links within a HTML
file. That is, links to other pages on our web site. I want to ignore
external links, links to pages outside our web site.

I successfully wrote a Perl script that lowercased every link (every A HREF
tag) indiscriminately before I realized some external links might need case
sensitivity. This regular expression did the trick:

foreach (@lines) {
  s/(<\s*(?:a|A)\s+(?:href|HREF)\s*=\s*\".+?\".*?>)/\L$1/g
  ...
}

How can I add a condition to test each A HREF's link (the stuff between the
quotes) to see if it's an internal or external link? What if I have more
than one A HREF tag on the same line like this:

<A HREF="Foo/Bar/index.html"><A HREF="http://www.fudd.com/elmer.html">

I tried the following code which appears to see each tag but fails to
lowercase. For the sake of argument, assume any all external links begin
with "http://" and internal links do not.

foreach (@lines) {
  while ( m {<\s*a\s+href\s*=\s*"(.+?)".*?>}gix ) {  # search for A HREF
tags on this line
    unless ($1 =~ /http:\/\//) {  # ignore external links assuming anything
that starts with "http://" is external
      s/$1/\L$1/;  # lowercase the internal link
    }
  }
  push (@newlines, $_);
}

Am I using the m//g operation correctly? Please help.

Must be Friday the 13th,
Gregg Casillo





More information about the Lexington-pm mailing list