LPM: A many monkeys question

Joe Hourcle oneiros at dcr.net
Sat Aug 14 10:49:18 CDT 1999



On Sat, 14 Aug 1999, Rich Bowen wrote:

> It might be worthwhile to make three passes through the document, the
> first time through changing the ones with http: to A* *H*R*E*F or
> something like that, then doing your case changes, then changing the A*
> *H*R*E*Fs back to A HREFs. I know it's clunky, but I can't seem to think
> of another way to do it.
> 
> Of course, if you are jut trying to solve the problem that links are the
> wrong case and getting 404s, you might want to consider a non-perl
> approach (gasp! heresy!) and use the mod_speling module (are you using
> Apache) to correct the casing on the fly, and avoid the 404s on the
> server side.

If you _really_ want to overdo it, you could check each link as you go.


I did this a couple of years back....I'd probably have to redo it, as IIS
uses temp redirects to report 'overloaded' servers.  You might look over
the HTTP specs, to have it react to other response codes. (301 redirects,
etc.)

($get, $lynx) were the path to helper apps, $get returned the doc with
header (as some webservers return different status codes for HEAD and GET)
$lynx was just the 'lynx -source' (well, full path))

@errors was an array that I'd check before outputting a response to see if
there were any messages in it.


#####

sub verify_url {

    $url = $_[0];
    $loop = $_[1]+1;
    if ($loop > 5)  {
    return 0;
    }

    open (URL, "$get $url |")
        || &abort("failed to open '$get $url'");
    
    $_ = <URL>;
    chomp;
    
    if (/HTTP.* 200/) {
    } elsif (/HTTP.* 302/) {
        while ($blah = <URL>) {
            if ($blah =~ /^Location:[ \t]*(.*)/) {
                $url = &verify_url($1,$loop);
                last;
            }
        }
    } elsif (/HTTP.* 404/) {
        push @errors, "URL was not found: '$url'";
        $url = 0;
    } elsif (/^usage:/) {
        push @errors, "Invalid URL: '$url'";
        $url = 0;
    }

    close (URL);

    if ($url) {
        open (URL, "$lynx $url |");
        <URL>;
        $_ = <URL>;
        if (/^lynx: Can\'t access start file/) {
            push @errors, "Cannot Locate Webserver: '$url'";
            $url = 0;
        }
        close (URL);
    }
    
    return $url;
}

#####


-Joe




More information about the Lexington-pm mailing list