LPM: A many monkeys question
Gregg Casillo
gcasillo at ket.org
Fri Aug 13 12:58:30 CDT 1999
Here's my problem: I want to lowercase all internal links within a HTML
file. That is, links to other pages on our web site. I want to ignore
external links, links to pages outside our web site.
I successfully wrote a Perl script that lowercased every link (every A HREF
tag) indiscriminately before I realized some external links might need case
sensitivity. This regular expression did the trick:
foreach (@lines) {
s/(<\s*(?:a|A)\s+(?:href|HREF)\s*=\s*\".+?\".*?>)/\L$1/g
...
}
How can I add a condition to test each A HREF's link (the stuff between the
quotes) to see if it's an internal or external link? What if I have more
than one A HREF tag on the same line like this:
<A HREF="Foo/Bar/index.html"><A HREF="http://www.fudd.com/elmer.html">
I tried the following code which appears to see each tag but fails to
lowercase. For the sake of argument, assume any all external links begin
with "http://" and internal links do not.
foreach (@lines) {
while ( m {<\s*a\s+href\s*=\s*"(.+?)".*?>}gix ) { # search for A HREF
tags on this line
unless ($1 =~ /http:\/\//) { # ignore external links assuming anything
that starts with "http://" is external
s/$1/\L$1/; # lowercase the internal link
}
}
push (@newlines, $_);
}
Am I using the m//g operation correctly? Please help.
Must be Friday the 13th,
Gregg Casillo
More information about the Lexington-pm
mailing list