SPUG: Web Bugs

Mon Aug 20 20:34:28 CDT 2001

In response to Doug...  Ralph Kimball explains the utility of
transparent gifs better than I in The_Data_Webhouse_Toolkit, p. 123-126.
He calls it null logging.  I'll give a summary and my take on this.

You can specify the source of the transparent image to be a URI that
points to a server or set of servers dedicated to collecting page view
data by serving up only transparent images.  Let's call them logging
servers.  The IMG SRC tag used to embed the transparent gif can have
useful metadata embedded in it - whatever you want to log about the
page/frame it's embedded in.  To borrow Kimball's example, the tag could
look like:

<IMG
SRC="http://logserver.mega-merc.com/nullpic.gif?type=catalog&sku=bear089
">

In this example, logserver.mega-merc.com points to the logging servers.
Then, instead of parsing through the front end servers' logs, you just
go through the logging servers'  logs.  They'll have only http get calls
to the transparent image, along with whatever metadata you embed in the
get call parameters to the image.  In this example, you'd see a catalog
page hit for sku bear089.

In our case, parsing through smaller logs would be very useful - we
collect over 20 GB of IIS logs daily, coming from several geographic
locations and many servers.  Of course Perl handles this data
effectively, but decreasing what gets parsed in the first place is very
attractive to me.  I'd like to provide faster turnaround time for
feedback on our sites - logging what we're interested in to a set of
dedicated logging servers by embedding their URI in transparent gifs is
one approach.

As a user you can tell your browser not to show images to avoid getting
logged, but I don't see the big deal, as long as you trust what sites
and services you use.  For my group, knowing how customers (in
aggregate) use our sites is extremely valuable to designers, management,
marketing, operations, and content creators.  If we don't know how
effective our site is, we might as well close up shop and go home.  I
hope that services I use online are always looking for ways to improve.

I have yet to think of any utility for transparent gifs in email other
than spam and tracking the path of a message as it gets passed along by
html mail clients.  Both uses are obnoxious.

-----Original Message-----
From: Ken McGlothlen [mailto:mcglk at artlogix.com] 
Sent: Monday, August 20, 2001 18:05
To: Doug Beaver
Cc: Jonathan Woodard; Wallendahl, Michael/SEA; SPUG
Subject: Re: SPUG: Web Bugs

Doug Beaver <doug at beaver.net> writes:

| What is it about transparent gifs (whether they are static or 
| generated by a
| cgi) that makes it easier to log and retrieve page view data?  I am
trying to
| see the benefit, but I can't.  Can you explain a little more?

Specifically, when you visit a site (say, cnn.com), they have the option
of dropping in a webbug (or set of them) from various other firms.  The
cnn.com page might consist of:

        The HTML document
        An IBM ad
        A Compaq ad
        A doubleclick.com webbug

The doubleclick.com webbug almost always has a way of encoding more
information in the URL, so now doubleclick.com knows that you saw the
article, which ads you saw, and when you saw it.  They also work with
cnn.com to discover the referring URL.

Alone, this is no big deal, but you can see how, with enough webbugs on
enough sites (and it doesn't take a majority of them), doubleclick.com
can come up with a really good profile of individual users, and come up
with more effective (read "obnoxious") advertising tactics.

Even worse is emails---it's like a read-receipt that mailreaders like
Outlook won't let you block.  This is one of the primary reasons why I
don't use a graphical mailreader.

| The thing that upsets me about web bugs is that you can't turn them 
| off.  At least you can turn off cookies.  Even if you're using a proxy

| which strips your identifying headers, they can still track you since 
| the tracking info is encoded in the image name.

Well, there are ways.  On the Macintosh, for example, a popular
web-browser named OmniWeb allows you to do URL blocking (with regular
expressions, no less), and that one ability (along with superior cookie
management) has made it my favorite browser.  Mozilla is also going to
permit you to block images from sites, whenever it becomes ready for
prime-time.  Your only other avenues are HTML proxies like junkbuster,
which block image requests from sites you select.

| You might be able to test for the existence of web bugs by using a 
| proxy and doing a HEAD request on each "image" referred to by <img> 
| tags.

Actually, if you can just get a list of IMG URLs out of the page
efficiently, they're pretty easy to spot.  OmniWeb has the "Get Info"
command; it will list all the resources a page attempts to load.  But it
does take a pair of eyeballs to distinguish ads and webbugs from
legitimate spacers and the like.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://zipcon.net/spug/