Phoenix.pm: Job Hunting Saga

Doug Miles doug.miles at bpxinternet.com
Thu Mar 7 11:21:37 CST 2002


Scott Walters wrote:
> This is ment to amuse.
> 
> I'm lucky enough to have a first-page ranking on Google for
> search phrases as "computer programmer resume". Hungry for job leads,
> I hobbled together a quick script (thanks to Bill & O'Reilly for
> the regex) to parse my Apache access_log for Google search referals
> under the logic that a lot of people looking for resumes would
> search google while at work, and then click on mine, and I would get
> their domain name to follow up. The script is at the end for anyone
> interested. It's partially commented out - one behavior was just to
> dump a count of words that I got hits on. Right now, it only prints
> out instances from the log of searchhits from google for "resume" +
> other words.

This is pretty interesting.

> What I came up with surprised me:
> 
> 1. Almost all of the hits are from dialups/dynip DNS/cable, and .edu's. 
> 2. Of the real companies, most hits were from former employers (3).

This is amusing!

> 3. "websensecache.tco.census.gov wants resume computer programmer" was one 
> of the log entries.
> 4. About 5/100 appear to be valid hits from inside companies.
> 5. In the last 6 months, first page Google ranking has generated 0 non-spam
> emails.
> 
> For the curious, the output is at http://www.illogics.org/google.html.
> 
> #!/usr/bin/perl
> 
> use CGI;
> use Socket;
> 
> $ident = 'http://www.google.com/search?q=';
> 
> open $f, '<', 'access_log' or die;
> WEBHIT: while(<$f>) {
> 
>         ($host, $ident_user, $auth_user, $date, $time,
>             $time_zone, $method, $url, $protocol, $status, $bytes,
>             $referer, $agent) =
> /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?) (\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/;
> 
>   next WEBHIT unless(substr($referer, 0, length($ident)) eq $ident);
>   $qs = substr($referer, length($ident));
>   $qs =~ s/&.*//;
>   $qs = CGI::unescape($qs);
>   $qs =~ s/[^ a-zA-Z0-9]//g;
>   foreach my $i (split / /, lc($qs)) {
>     $words{$i}++;
>   }
>   if($qs =~ m/resume/) {
>     $host = gethostbyaddr(scalar inet_aton($host), AF_INET) or next WEBHIT;
>     print qq{<tr><td>$host</td><td>wants</td><td>$qs</td></tr>\n};
>   }
> }
> close $f;
> 
> exit 0;
> 
> foreach my $i (keys %words) {
>   push @words, sprintf "%8d %s", $words{$i}, $i if $words{$i} > 1;
> }
> @words = sort @words;
> 
> foreach my $i (@words) {
>   print $i, "\n";
> }
> 
> 
> 
> 
> 






More information about the Phoenix-pm mailing list