LWP and DICE

Mark Widawer mark at markwild.com
Fri Aug 25 00:02:18 CDT 2000


Hello all.

Another perl puzzler for us.

I want to write a script that automatically queries Dice.com. Of course, the right way to do that is with LWP. And, of course, there is more than one way to do it. I think I've found a method that is on the path to working, but haven't even been able to query the site successfully yet. 

Here's the nitty gritty.

If you look at the source for the dice query form, you'll see that it is making a POST request. The method and variables are pretty clear. If you do the query from your browser, you'll see a URL for (what looks like) a GET request. That URL seems to have a session id in it. It looks like this:

http://jobsearch.dice.com/jobsearch/jobresults.cgi?sr=1&hp=25&cf=1e.32836500&brief=0&banner=1

If you monkey with some of the parameters, you can modify the way the listing appears.
 
If I send that particular GET request via my script, I get the correct search results HTML page. However, that URL becomes invalid by the next day (and probably sooner). So what (I think) I need to do is make the original POST and then allow LWP to be redirected to the GET URL.

The search form I am trying to emulate (simplified from the original at dice.com) is this:
--------------------
<html>
<head>
</head>
<form action="http://jobsearch.dice.com/jobsearch/jobsearch_simple.cgi" method="POST">

<input type=text name="query" size=36>
<input type="SUBMIT" value="Search">
<select name="method">
  <option  value="and" selected>Results must have all of the listed keywords above</option>
  <option  value="or">Results can have any of the keywords listed above </option>
  <option  value="bool">Results will use the Boolean expression listed above </option>
</select>
<input type="Hidden" name="num_per_page" value=100>
<input type="HIDDEN" name="banner" value=1>
<input type="hidden" name="num_to_retrieve" value=1250>
</form> 
</body>
</html>
--------------------
 
Save that form as an HTML file and load it into your browser and it works as expected. 

What I think is happening in the browser is that the script executed by the POST returns a redirect to a page that uses the GET. If any one has an alternate idea of what is going on, and how to make it work, I'm all ears. 

I've written code (adapted from the LWP docs) that does the post, which follows:

-----------------
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;

$ua=LWP::UserAgent->new();
my $req = POST 'http://jobsearch.dice.com/jobsearch/jobresults.cgi' ,
                [ query            => 'perl', 
                  method           => 'and',
                  banner           => 1,
                  num_to_retrieve  => 250,
                  num_per_page     => 10,
                  submit           => 'Search'
                ];
my $content = $ua->request($req)->as_string;

print $content;
-----------------

The content that I get back from this request is:

----------------
HTTP/1.1 302 Found
Connection: close
Date: Fri, 25 Aug 2000 04:24:31 GMT
Location: http://www.dice.com/jobsearch/index.html
Server: Apache/1.3.11 (Unix)
Content-Type: text/html
Client-Date: Fri, 25 Aug 2000 04:21:06 GMT
Client-Peer: 208.128.117.192:80
Title: 302 Found

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://www.dice.com/jobsearch/index.html">here</A>.<P>
</BODY></HTML>
----------------

I figured that I would get a redirect in the HTTP header, but no such luck. If one were sent, I could parse the header for the redirect string, and then do a second LWP request to that URL.

One last comment: When I've substituted other URLs into this same Perl script, I get back the content that I expect. 

Anyway, I've blabbed enough here. If anyone has any ideas about what it will take to make this work, I'm all ears. Thanks in advance for your help.

--Mark Widawer



 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/archives/thousand-oaks-pm/attachments/20000824/0bf56c10/attachment.htm


More information about the Thousand-oaks-pm mailing list