[boulder.pm] walking web pages?

Rob Nagler nagler at bivio.com
Sat Jul 1 15:58:33 CDT 2000


>   This looks decent, but we need something that can be cron'd and run multiple
> times from command line with output that can be logged to a file for reporting
> functions, thus the reason we wanted a perl script. 

I've attached a VERY SIMPLE hack of something that logs into our
site.  It uses HTTP::Cookie, HTTP::Request, etc.  

The second file is production code used in our Club Site(tm) feature.
It inserts our header on a club's home page, which can be arbitrary
HTML.  It is an example of using HTML::Parser.  I apologize for
the "extra stuff", but I don't have time to distill it to just
CPAN packages.

There is also HTML::Table and HTML::Form, which parse tables and
forms ;-), but I haven't used them so no examples.

In general, the tricky part is understanding the HTML that is
generated.  Once you've got that, "pushing buttons" is easy.
If you have control of the site, you could even put comments
into the generated html to make it easy to find the interesting
bits--not really recommended.  If you don't have control of
the HTML, make sure you have really good error handling and 
logging.  Ya neva know what people or generators are gonna do...

Hope this helps,
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_get.pl
Type: application/x-perl
Size: 958 bytes
Desc: not available
Url : http://mail.pm.org/archives/boulder-pm/attachments/20000701/eac20666/test_get.bin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/archives/boulder-pm/attachments/20000701/eac20666/HomePageParser.htm


More information about the Boulder-pm mailing list