SPUG: Web Bugs

Mon Aug 20 19:27:37 CDT 2001

On Mon, Aug 20, 2001 at 01:02:31PM -0700, Jonathan Woodard wrote:

> I run a data warehouse to provide analytical data about a web service.
> We don't use transparent gifs in our web pages, but we are evaluating
> whether and how to use them.  If we do implement them it would make
> our work of parsing through IIS logs for page view data much easier.
> It would not be used to track individual users (we're a free service
> w/o cookies, so we have no reliable way to track single users).  I
> know of another web team that does use clear gifs in their pages for
> exactly the same thing.  This practice is essentially a more efficient
> web server logging mechanism.  I don't see how that is an invasion of
> privacy.

Hello Jonathan,

What is it about transparent gifs (whether they are static or generated
by a cgi) that makes it easier to log and retrieve page view data?  I am
trying to see the benefit, but I can't.  Can you explain a little more?

> This tool, like any other, can be used for less scrupulous ends,
> including spam.  Using clear gifs in html emails sounds distasteful to
> me, but I think it would be a clever/sneaky way to discover who is
> leaking information by forwarding confidential messages.
>
> Jonathan

The thing that upsets me about web bugs is that you can't turn them off.
At least you can turn off cookies.  Even if you're using a proxy which
strips your identifying headers, they can still track you since the
tracking info is encoded in the image name.  You can't get the content
for the image unless you ask for it by name.  Sounds like a beer slogan
or something.  :-)

At least for now, most web bugs are obvious, like:

<img src="image.cgi?tid=XH460LWOR802NVX04">

You can configure your proxy to look for img tags where the image looks
like a cgi call and strip those out, but what about autogenerated
tracking that looks innocent:

<img src="/images/364374/navbar.jpg">

Whether it is on a web page or in an email, obvious or not, web bugs
make me really nervous.

You might be able to test for the existence of web bugs by using a proxy
and doing a HEAD request on each "image" referred to by <img> tags.
Here is a comparison of the headers for a generated image and a static
image:

dougb at towelie:/usr/local/www/cgi-bin
 % telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD /cgi-bin/image.cgi HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 21 Aug 2001 00:21:07 GMT
Server: Apache/1.3.20 (Unix)
Connection: close
Content-Type: image/gif

Connection closed by foreign host.
dougb at towelie:/usr/local/www/cgi-bin
 % telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD /randomimages/simpson004.gif HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 21 Aug 2001 00:21:45 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Mon, 20 Aug 2001 22:42:08 GMT
ETag: "7e421-439-3b819240"
Accept-Ranges: bytes
Content-Length: 1081
Connection: close
Content-Type: image/gif

Connection closed by foreign host.

image.cgi takes a random image from /randomimages and displays it.  The
second request is for a static image inside /randomimages.  You can see
that the static request had a lot more header information.  You could
make the cgi generate Content-Length and Last-Modified headers, but I
wonder how many web bug creators think of matching their headers to
static images served by their web server platform...

Doug

-----Original Message-----
From: Wallendahl, Michael/SEA [mailto:mwallend at ch2m.com] 
Sent: Friday, August 17, 2001 16:52
To: SPUG
Subject: SPUG: Web Bugs

I'm just curious what everyone's opinion is about "web bugs" -- 1x1
transparent gifs that some companies embed in their web pages and HTML
e-mails.  An overview article can be found here:
http://www.eff.org/Privacy/Marketing/web_bug.html

Some people say that these gif's are just used to track how popular a
web site is--if that's the case, why would they include  identifying
information in the web bug URL?  I was pretty indifferent about the
practice until I realized that some junk mail I got from my student loan
company, SallieMae, included a little snippet of html code like this

<IMG
SRC="http://salliemae.sfi0.com/image.cgi/slm008-c/myName@myDomain.com">

This means that they now know the exact second that I read my e-mail and
they also know if I forward this specific message onto someone else
(because the "hit" in the log file will come from a different source IP
address but contain the same e-mail address tag).  It's like a Read
Receipt that I can't get around.  And since it went to my hotmail
account, I can't force it to "plain text" format before I read it to get
around this problem.

I feel like writing a Perl script to "spam" the salliemae.sfi0.com web
server back with random e-mail addresses, but that wouldn't solve
anything (besides, it would be easy to filter out my "spam" from their
logs because all the hits would be from the same address).

Anyway, just wondering what you all think.  Do you use these "bugs" in
your own web projects?

-Mike

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://zipcon.net/spug/