[tpm] WWW::Mechanize and setting cookies

Abram Hindle abram.hindle at softwareprocess.us
Fri Aug 1 07:32:49 PDT 2008


Sorry, this is late, I sent this under an unsubscribed email before.

$ua->cookie_jar({});

or

WWW::Mechanize->new(cookie_jar=>{}); Will initialize a cookie jar for you

This is helpful:

$ua->cookie_jar(new HTTP::Cookies());
$ua->default_headers(getDefaultHeader());

sub getDefaultHeaders {
        my $header = HTTP::Headers->new(
                User_Agent => "Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7.12) Gecko/20050920 Firefox/1.0.7",
                Accept     =>
"text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
                Accept_Language => "en-us,en;q=0.5",
                Accept_Encoding => "gzip,deflate",
                Accept_Charset  => "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
                Keep_Alive => 300,
                Connection => "keep-alive",
        );
        #$header->remove_header('TE');
        return $header;
}

Also here are slides with tips at the end for using WWW::Mechanize

http://presentation.abez.ca/victoriaPMJan2003Slides.pdf

abram


Zoffix Znet wrote:
> Yes, that won't work, because WWW::Mechanize doesn't actually set
> {cookie_jar} element in its blessed hashref. Take a look at Mech's sub
> new {}..
> 
> Now why it doesn't error out, as you've said, with " Can't call method
> "set_cookie" on an undefined value at LINE" with your code below I don't
> really understand.. but anyway.. use ->cookie_jar method to obtain the
> HTTP::Cookies object and always check the "use base" or @ISA assignments
> when you can't find the documented method in the code ^_^
> 
> Cheers.
> 
> 
> On Thu, 2008-07-31 at 21:22 -0400, Madison Kelly wrote:
>> adam.prime at utoronto.ca wrote:
>>> Quoting Madison Kelly <linux at alteeve.com>:
>>>
>>>> Hi all,
>>>>
>>>>   I've run into the need to set some cookies for a WWW::Mechanize
>>>> object I am using. As I understand it, the default 'cookie_jar' is
>>>> supposed to be an instance of HTTP::Cookies, but I can't see where that
>>>> is implemented in the module. Despite that, I tried calling the
>>>> 'set_cookie' method but, as I expected, got an error saying that is not
>>>> a known method.
>>>>
>>>>   So dear TPM, can someone clue me in on how to set a bunch of cookies
>>>> using WWW::Mechanize?
>>> Looking at the documentation it looks like Mechanize is designed such 
>>> that it will keep track of cookies that get set through a series of 
>>> requests.  It looks to me like the only way to set it up to start with 
>>> cookies in the first place would be to Create an instance of 
>>> HTTP::Cookies with the stuff you want in it, and use that when you 
>>> create your initial Mechanize object.
>> I've tried that, see below (to keep the message clean).
>>
>>>> Bonus round!
>>>>
>>>>   If this is an HTTP::Cookies object, what pray tell is '$version'
>>>> supposed to be when setting the cookie? Beyond setting it, there is no
>>>> mention of it in the docs and the code merely shows it being set to '0'
>>>> in undef.
>>>>
>>>> Thanks as always!
>>> Looking at the code it seem to put "\$Version=$version" into your cookie 
>>> if you set it to a value larger than 0.  I have no idea what that's 
>>> about, but i'd probably be passing in 0's.
>>>
>>> The interface for HTTP::Cookies looks pretty horrid :x
>>>
>>> Adam
>> Indeed it is...
>>
>> At any rate, here is what I am doing. I connect to an HTTPS site that is 
>> made by a nameless "big company" which means the design is terribly 
>> inconsistent. For some reason, after doing a particular search, the site 
>> returns a redirect page that sets a pile of cookies using JS 
>> 'document.cookie="..."' calls, the a 'document.location' to follow the 
>> link, all of which is triggered by an 'onload' call. Now the problem is, 
>> all the 'document.cookie' values are needed to get the actual data I 
>> need. Seeing as WWW::Mechanize doesn't support JS, I need to find a way 
>> to set them manually.
>>
>> So here are the relevant bits:
>>
>> -=] Setting up my WWW::Mechanize object
>> use HTTP::Cookies;
>> my $agent = WWW::Mechanize->new(
>> 	autocheck	=>	1,
>> 	cookie_jar	=>	HTTP::Cookies->new(),
>> );
>> $agent->agent_alias("Linux Mozilla");
>>
>> # I do a pile of work, following links, submitting forms and such, until
>> # I get to the JS redirect page I described, where I try to follow the
>> # redirect after setting cookies. **This Fails**.
>>
>> -=] Process the JS redirect bastardization
>> # Process the results.
>> my $processing_page=$agent->content;
>> foreach my $cookie ($processing_page=~/document.cookie="(.*?)"/gs)
>> {
>> 	my ($variable, $value, $path, $expires)="";
>> 	if ( $cookie =~ /expires/ )
>> 	{
>> 		($variable, $value, $path, $expires)=$cookie=~/(.*?)=(.*?); 
>> path=(.*?); expires=(.*?);/;
>> 		print "Setting Cookie: [$variable]->[$value] \@ [$path] ($expires).\n";
>> 	}
>> 	else
>> 	{
>> 		($variable, $value, $path)=$cookie=~/(.*?)=(.*?); path=(.*?);/;
>> 		print "Setting Cookie: [$variable]->[$value] \@ [$path].\n";
>> 	}
>> 	$$agent{cookie_jar}->set_cookie(0, $variable, $value, $path);
>> }
>> my ($processing_link)=$processing_page=~/window.location="(.*?)"/;
>> print "Following results link: [$processing_link]\n";
>> $agent->get($processing_link);
>> -=-=-=-=-=-=-=-=-
>>
>> The closest I could figure to access the HTTP::Cookies methods was by 
>> calling it as I did, though I realize this is probably not smart as I am 
>> trying to access internal values, but it was as close as I could get. 
>> This doesn't error, but it also doesn't seem to work.
>>
>> Any further ideas?
>>
>> Madi
>> _______________________________________________
>> toronto-pm mailing list
>> toronto-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/toronto-pm
> 
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: not available
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20080801/83f143dc/attachment-0001.bin>


More information about the toronto-pm mailing list