SPUG: robot (spider)

CJ Collier cjcollier at colliertech.org
Tue Nov 9 17:57:42 CST 2004

Be sure to implement reading of robots.txt

Use LWP::UserAgent to read the pages

Use HTML::Parser to find the links

Create a breadth-first recursive function to follow links.

Create a global hash to keep track of the links that have already been

You could accept a boolean argument to the worker object constructor
that forces the spider to stay on the same domain.

I've got more ideas if you'd like to hear them.


On Fri, 2004-11-05 at 13:00 -0800, Luis Medrano wrote:
> List,
> I'm trying to build a spider. Can somebody explain how it will be the easy way to do it.
> Thanks,
> Luis
> _____________________________________________________________
> Seattle Perl Users Group Mailing List  
> POST TO: spug-list at mail.pm.org  http://spugwiki.perlocity.org/
> ACCOUNT CONFIG: http://mail.pm.org/mailman/listinfo/spug-list
> MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
> WEB PAGE: http://seattleperl.org/

More information about the spug-list mailing list