[bcn-pm] Simple Robot

Zeno zeno at timallen.org
Fri Nov 15 10:33:13 CST 2002


--------- Original message --------
>From: "Robert Franks"
>I am running a script which retrieves the home
>page of a site, then gets
>another page linked from the home page.
>Problem is this seems to trigger two different
>sessions and I need the robot
>to retrieve pages within the same session

Hi Rob!
This sounds familiar. I'm assuming the session information is getting saved
in a cookie. Some sites will default to using the cookie to save session
information, but if they detect that you have cookies turned off, they will
send the session information as a GET (in the URL) or as a POST (in a hidden
variable on the page). I would check to see if visiting the page with my
cookies turned off causes the session information to show up in the URL.

If this is the case you can get the session identifier from the header and
then plug it back in to your next page request.  ** NOTE ** in a previous
mail I said this was in LWP::Simple.  Sorry, that was wrong.  You can get
this information with LWP::UserAgent (Angel mentions this as a way to use
the cookie jar, too).  Be ready for a bit of hairy coding, as you have to
send your requests through LWP::UserAgent with parameters to make your Perl
program look like Internet Explorer to get some sites to work.

This worked for me when I had to do this with a page that went through a
four or five page form to determine what kind of part the user wanted. It
was ugly but worked. I would Guard against any Fleeting desires to do it
another way ;)

Take care! Daniel is huge now.
--
Tim/Zeno



More information about the Barcelona-pm mailing list