Phoenix.pm: Job Hunting Saga
Doug Miles
doug.miles at bpxinternet.com
Thu Mar 7 11:21:37 CST 2002
Scott Walters wrote:
> This is ment to amuse.
>
> I'm lucky enough to have a first-page ranking on Google for
> search phrases as "computer programmer resume". Hungry for job leads,
> I hobbled together a quick script (thanks to Bill & O'Reilly for
> the regex) to parse my Apache access_log for Google search referals
> under the logic that a lot of people looking for resumes would
> search google while at work, and then click on mine, and I would get
> their domain name to follow up. The script is at the end for anyone
> interested. It's partially commented out - one behavior was just to
> dump a count of words that I got hits on. Right now, it only prints
> out instances from the log of searchhits from google for "resume" +
> other words.
This is pretty interesting.
> What I came up with surprised me:
>
> 1. Almost all of the hits are from dialups/dynip DNS/cable, and .edu's.
> 2. Of the real companies, most hits were from former employers (3).
This is amusing!
> 3. "websensecache.tco.census.gov wants resume computer programmer" was one
> of the log entries.
> 4. About 5/100 appear to be valid hits from inside companies.
> 5. In the last 6 months, first page Google ranking has generated 0 non-spam
> emails.
>
> For the curious, the output is at http://www.illogics.org/google.html.
>
> #!/usr/bin/perl
>
> use CGI;
> use Socket;
>
> $ident = 'http://www.google.com/search?q=';
>
> open $f, '<', 'access_log' or die;
> WEBHIT: while(<$f>) {
>
> ($host, $ident_user, $auth_user, $date, $time,
> $time_zone, $method, $url, $protocol, $status, $bytes,
> $referer, $agent) =
> /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?) (\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/;
>
> next WEBHIT unless(substr($referer, 0, length($ident)) eq $ident);
> $qs = substr($referer, length($ident));
> $qs =~ s/&.*//;
> $qs = CGI::unescape($qs);
> $qs =~ s/[^ a-zA-Z0-9]//g;
> foreach my $i (split / /, lc($qs)) {
> $words{$i}++;
> }
> if($qs =~ m/resume/) {
> $host = gethostbyaddr(scalar inet_aton($host), AF_INET) or next WEBHIT;
> print qq{<tr><td>$host</td><td>wants</td><td>$qs</td></tr>\n};
> }
> }
> close $f;
>
> exit 0;
>
> foreach my $i (keys %words) {
> push @words, sprintf "%8d %s", $words{$i}, $i if $words{$i} > 1;
> }
> @words = sort @words;
>
> foreach my $i (@words) {
> print $i, "\n";
> }
>
>
>
>
>
More information about the Phoenix-pm
mailing list