[tpm] WWW::Mechanize and setting cookies

Madison Kelly linux at alteeve.com
Thu Jul 31 18:22:31 PDT 2008


adam.prime at utoronto.ca wrote:
> Quoting Madison Kelly <linux at alteeve.com>:
> 
>> Hi all,
>>
>>   I've run into the need to set some cookies for a WWW::Mechanize
>> object I am using. As I understand it, the default 'cookie_jar' is
>> supposed to be an instance of HTTP::Cookies, but I can't see where that
>> is implemented in the module. Despite that, I tried calling the
>> 'set_cookie' method but, as I expected, got an error saying that is not
>> a known method.
>>
>>   So dear TPM, can someone clue me in on how to set a bunch of cookies
>> using WWW::Mechanize?
> 
> Looking at the documentation it looks like Mechanize is designed such 
> that it will keep track of cookies that get set through a series of 
> requests.  It looks to me like the only way to set it up to start with 
> cookies in the first place would be to Create an instance of 
> HTTP::Cookies with the stuff you want in it, and use that when you 
> create your initial Mechanize object.

I've tried that, see below (to keep the message clean).

>> Bonus round!
>>
>>   If this is an HTTP::Cookies object, what pray tell is '$version'
>> supposed to be when setting the cookie? Beyond setting it, there is no
>> mention of it in the docs and the code merely shows it being set to '0'
>> in undef.
>>
>> Thanks as always!
> 
> Looking at the code it seem to put "\$Version=$version" into your cookie 
> if you set it to a value larger than 0.  I have no idea what that's 
> about, but i'd probably be passing in 0's.
> 
> The interface for HTTP::Cookies looks pretty horrid :x
> 
> Adam

Indeed it is...

At any rate, here is what I am doing. I connect to an HTTPS site that is 
made by a nameless "big company" which means the design is terribly 
inconsistent. For some reason, after doing a particular search, the site 
returns a redirect page that sets a pile of cookies using JS 
'document.cookie="..."' calls, the a 'document.location' to follow the 
link, all of which is triggered by an 'onload' call. Now the problem is, 
all the 'document.cookie' values are needed to get the actual data I 
need. Seeing as WWW::Mechanize doesn't support JS, I need to find a way 
to set them manually.

So here are the relevant bits:

-=] Setting up my WWW::Mechanize object
use HTTP::Cookies;
my $agent = WWW::Mechanize->new(
	autocheck	=>	1,
	cookie_jar	=>	HTTP::Cookies->new(),
);
$agent->agent_alias("Linux Mozilla");

# I do a pile of work, following links, submitting forms and such, until
# I get to the JS redirect page I described, where I try to follow the
# redirect after setting cookies. **This Fails**.

-=] Process the JS redirect bastardization
# Process the results.
my $processing_page=$agent->content;
foreach my $cookie ($processing_page=~/document.cookie="(.*?)"/gs)
{
	my ($variable, $value, $path, $expires)="";
	if ( $cookie =~ /expires/ )
	{
		($variable, $value, $path, $expires)=$cookie=~/(.*?)=(.*?); 
path=(.*?); expires=(.*?);/;
		print "Setting Cookie: [$variable]->[$value] \@ [$path] ($expires).\n";
	}
	else
	{
		($variable, $value, $path)=$cookie=~/(.*?)=(.*?); path=(.*?);/;
		print "Setting Cookie: [$variable]->[$value] \@ [$path].\n";
	}
	$$agent{cookie_jar}->set_cookie(0, $variable, $value, $path);
}
my ($processing_link)=$processing_page=~/window.location="(.*?)"/;
print "Following results link: [$processing_link]\n";
$agent->get($processing_link);
-=-=-=-=-=-=-=-=-

The closest I could figure to access the HTTP::Cookies methods was by 
calling it as I did, though I realize this is probably not smart as I am 
trying to access internal values, but it was as close as I could get. 
This doesn't error, but it also doesn't seem to work.

Any further ideas?

Madi


More information about the toronto-pm mailing list