[pm-h] Checking for Broken Links

G. Wade Johnson gwadej at anomaly.org
Thu Oct 24 19:50:33 PDT 2013


On Thu, 24 Oct 2013 18:54:53 -0500
Mike Flannigan <mikeflan at att.net> wrote:

> 
> Wow, I would not have figured that out - at least
> not in the same day.
> 
> I did #1, but I'll bet that doesn't work.
> Surely they already know about this.
> 
> I discovered that RT =
> https://rt.perl.org/
> 
> Thanks for doing all that free work :-)

What I should have done was point you at a few things to try and let
you make some progress. Since I hadn't tried hdb before and it was way
more effective than I thought, you got the advantage of me playing with
the tool.<grin/>

If anyone hasn't seen the post someone made about this a month ago,
you really need to check it out. Decent GUI debugger with the browser
as your interface.

G. Wade 

> On 10/24/2013 10:01 AM, G. Wade Johnson wrote:
> > That was fun, and it gave me a good excuse to play with Devel::hdb.
> >
> > There's a bug in the way that WWW::SimpleRobot handles broken links.
> >
> > If the link is in the original array that you pass, it recognizes
> > the broken link and calls the callback routine.
> >
> > But, when it's traversing a page and building a list of links, it
> > discards any link that fails a "head" request. So, all broken links
> > would be discarded.
> >
> > That's probably worth a bug report to the author.
> >
> > More Detail
> > -----------
> > To troubleshoot this, I first ran it the way you did. Then, I looked
> > at the docs for WWW::SimpleRobot and didn't see anything useful
> > there.
> >
> > Next, I looked at the source (nicely formatted by metacpan:
> > https://metacpan.org/source/AWRIGLEY/WWW-SimpleRobot-0.07/SimpleRobot.pm).
> >
> > On line 35, I noticed there was an ability to do a VERBOSE mode.
> > Looking down the code a little ways (lines 119-124), you can see
> > that verbose is used to print a "get $url" line before the
> > BROKEN_LINK_CALLBACK is called.
> >
> > Running that way showed that the code never prints
> > "get http://www.ncgia.ucsb.edu/%7Ecova/seap.html".
> >
> > Looking a little further shows lines 140-142, which discards the
> > link if head() fails.
> >
> > The hdb debugging interface was really nice for this.
> > (Unfortunately, I spent a fair amount of time playing with the
> > debugger.<shrug/>)
> >
> > I can see a couple of ways of fixing this:
> >
> > 1. Easiest: report the bug through RT and hope the author takes
> > care of it soon.
> >
> > 2. Patch your copy of WWW::SimpleRobot code to call the callback at
> > the head() failure or not to discard on the head() request.
> >
> > 3. Copy the WWW::SimpleRobot traversal code into your script and
> > fix it there.
> >
> > The first approach is probably the best.
> >
> > G. Wade
> >
> 
> _______________________________________________
> Houston mailing list
> Houston at pm.org
> http://mail.pm.org/mailman/listinfo/houston
> Website: http://houston.pm.org/


-- 
We've all heard that a million monkeys banging on a million typewriters
will eventually reproduce the works of Shakespeare. Now, thanks to the
Internet, we know this is not true.         -- Robert Wilensky, UCB


More information about the Houston mailing list