[Chicago-talk] My accomplishment for 2004 - WWW::Mechanize in the new Google Hacks

Leland Johnson easyasy2k at gmail.com
Tue Jan 11 10:35:16 PST 2005


Well, I think the best book in the world is now _Google Hacks 2nd
Edition_, despite Andy's claims to the contrary. You still should
check out _Spidering Hacks_ though - it's really good.

You can check out my new hack in the book, or download the hack from
O'Reilly's website:
http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf

You can download the code from my website, since copying it from the
PDF is a real pain:
http://protoplasmic.org/code/adwords_worth.pl.txt

For those of you that are interested, a somewhat short development
story (I've been writing too much lately) and tips on using
WWW::Mechanize follow.

Andy passed[1] the job to me back in October of 2004. I started work
on it immediately, since I didn't want to do my schoolwork at the
time. Here's a few things I discovered about WWW::Mechanize when
fleshing out the hack:

'autocheck => 1' works quite well in hacks like this. It does not work
well if your script must try its hardest to succeed.

I used WWW::Mechanize::Shell to generate the code initially for the
hack. I should have used HTTP::Recorder, but I've been using
WWW::Mechanize since 2002 and I haven't figured out how to use the new
fangled stuff yet.

If the site your are interface with WWW::Mechanize contains
javascript, keep an eye out for what it does. Sometimes it does
something that your script doesn't and if you don't look at the
javascript, you won't have a clue about why it is failing.

If the javascript does form validation, check what happens when you
send a blank/unchanged form or invalid input. If the site is smart,
you won't have any real problems that you can't deal with a a line
like 'die "X failed!" if $mech->content() =~ m/failed/i'. In my hack,
Google handles the country page just fine without me having to touch
the form, even though the javascript would complain if you did the
same thing in a web browser. I still don't really know how I'd set a
country.


I ended up learning a lot about grep and map to filter the table that
was on the final results page. If you want to be a perl expert, you
should really learn when and how to use map and grep - they are very
powerful.

I sent my copy of the script off to Rael after I had tested it on 3
different platforms and 2 versions of perl. Both of our responses were
something like "holy crap... it works". I had a scare for a while when
my script stopped working - I thought maybe Google had decided to
block the script and that it may not be published.

If a script that uses WWW::Mechanize stops working, check a few things first:

Did the website you are interfacing change?
Is the website you are interfacing broken?

Check both of these with a /real/ web browser, not
WWW::Mechanize::Shell or lynx. In my case, Google's AdWords site was
returning zeros for all keywords, no matter who or where you were. The
site returned to normal in a day and my script still worked.

If you are still interested, I wrote the code and Rael wrote the
article. I mentioned some ideas for "Hacking the Hack" and how
Crypt::SSLeay might be a problem for windows users, but that's about
it for my contributions to the text of the hack.

Anyways, I'm glad that it's all over. I am now a hundrediare and I get
something actually computer related to put on my resumé. I want to
thank Andy publicly for referring the job to me and helping me out
with map and grep. I'd also like to note that you should try to resist
sending inane comments about your development thus far to a client
with instant messenger - I succumbed to that temptation. Luckily, my
client was understanding.

Thanks for reading the whole thing! I hope you got something out of it
other than me tooting my own horn.

1.
"A+++ WOULd take another referred job again!!!1!"
It just goes to show you that most jobs come from personal contacts -
just as Andy said in his "getting a job" presentation. I had even met
Rael (in person) before this job at OSCON.

-- 
Leland Johnson
http://protoplasmic.org


More information about the Chicago-talk mailing list