[pm-h] Checking for Broken Links

Mike Flannigan mikeflan at att.net
Thu Oct 24 16:54:53 PDT 2013


Wow, I would not have figured that out - at least
not in the same day.

I did #1, but I'll bet that doesn't work.
Surely they already know about this.

I discovered that RT =
https://rt.perl.org/

Thanks for doing all that free work :-)



Mike


On 10/24/2013 10:01 AM, G. Wade Johnson wrote:
> That was fun, and it gave me a good excuse to play with Devel::hdb.
>
> There's a bug in the way that WWW::SimpleRobot handles broken links.
>
> If the link is in the original array that you pass, it recognizes the
> broken link and calls the callback routine.
>
> But, when it's traversing a page and building a list of links, it
> discards any link that fails a "head" request. So, all broken links
> would be discarded.
>
> That's probably worth a bug report to the author.
>
> More Detail
> -----------
> To troubleshoot this, I first ran it the way you did. Then, I looked
> at the docs for WWW::SimpleRobot and didn't see anything useful there.
>
> Next, I looked at the source (nicely formatted by metacpan:
> https://metacpan.org/source/AWRIGLEY/WWW-SimpleRobot-0.07/SimpleRobot.pm).
>
> On line 35, I noticed there was an ability to do a VERBOSE mode.
> Looking down the code a little ways (lines 119-124), you can see that
> verbose is used to print a "get $url" line before the
> BROKEN_LINK_CALLBACK is called.
>
> Running that way showed that the code never prints
> "get http://www.ncgia.ucsb.edu/%7Ecova/seap.html".
>
> Looking a little further shows lines 140-142, which discards the link
> if head() fails.
>
> The hdb debugging interface was really nice for this. (Unfortunately, I
> spent a fair amount of time playing with the debugger.<shrug/>)
>
> I can see a couple of ways of fixing this:
>
> 1. Easiest: report the bug through RT and hope the author takes care of
> it soon.
>
> 2. Patch your copy of WWW::SimpleRobot code to call the callback at the
> head() failure or not to discard on the head() request.
>
> 3. Copy the WWW::SimpleRobot traversal code into your script and fix it
> there.
>
> The first approach is probably the best.
>
> G. Wade
>



More information about the Houston mailing list