SPUG: Chip Salzenberg Defense Fund

Ken Meyer kmeyer at blarg.net
Fri Aug 5 13:20:13 PDT 2005

Again, please note that this is all education for me.

So, how does the misbehaving bot, and the possible responses to it, differ
from the ways that any old DoS attack is countered.  Since we haven't had
any virtual Internet shut-downs recently, it suggests to me that effective
measures have been developed.  Also again, I don't understand whether a
robots.txt file has the same status as password protection in establishing
criminal activity.

Ken Meyer

-----Original Message-----

From: jlb [mailto:jlb at io.com]
Sent: Thursday, August 04, 2005 6:24 PM
To: Ken Meyer
Cc: SPUG Members

Subject: Re: SPUG: Chip Salzenberg Defense Fund

The methods IRC servers and Mail servers use aren't appropriate for web
sites because they introduce significant delay to each and every
connection.  Even if the information were cached, the initial connection
would be slow enough to drive many web users away.

Just speaking as someone who has run a website and encountered some of
these issues:  Frequently "web bots" are very poorly behaved, issuing
bursts of thousands of requests in a row, at frequent intervals.  This
can often impact other legitimate users of the site, as well as
potentially costing the site money in bandwidth and hosting.

There is a big difference between some person running a recursive wget on
your site once to mirror it for their own personal use, and someone
frequently and aggressively running screen scraping bots against it in an
automated fashion.

If a web site has a robots.txt indicating they dont wish to be spidered,
and have gone so far as to ban a single misbehaving bot multiple times,
only to have those bans evaded...well, at what point does it become "bad?"

On Thu, 4 Aug 2005, Ken Meyer wrote:

> My desire to understand this has trumped my desire to conceal my lack of
> geeky sophistication.
> Is an "open proxy" used simply to evade an attempt by sites to
> block this company's bots and not others?
> What about robots.txt; seems to me that this file implements no more than
> "gentlemen's agreement", rather than a legal barrier, such as a password
> access a computer on a network that is not intended for public access,
> as a web server?
> Here is an excerpt from Wikipedia:
> "Because proxies are implicated in abuse, system administrators have
> developed a number of ways to refuse service to open proxies. IRC networks
> such as the Blitzed network automatically test client systems for known
> types of open proxy. [1] Likewise, a mail server may be configured to
> automatically test mail senders for open proxies, using software such as
> Michael Tokarev's proxycheck. [2]"
> So why have these techniques not been effective against the subject
> "scraping" in point (by the way, I thought that "scraping" referred to
> getting text off a screen shot that is in raster format, i.e. OCR, not
> actually snarfing the ASCII)?
> So, when is one hacking into a system and when is one simple accessing
> material that is exposed and fair game, whether that is desirable or not?
> What sort of material was this company harvesting?  Does it bear on
> which is a very tight subject in the case of medical information -- HIPAA
> philosophy is highly prevalent.
> Where are Mr. Salzenberg's computers now?  Are there contents intact?  Who
> has control of any files copied from them?
> It is unwise to address this problem via an organization called
> "geeksunite", which is certainly off-putting to the majority of the
> population, which if they are not actually repelled by the geek image,
> presume that the subject will be beyond their comprehension.  If there are
> truly illegal acts going on, isn't a counterattack possible?  If civil
> liberties have been violated, certainly  the usual organizations will be
> alarmed and will provide support.  What about the ACLU and the EFF to
> Mr. Salzenberg?  I would rather support a well-known champion of the
> individual than directly to an individual who has not defined the problem
> his approach to addressing it in other than vague terms -- or is the
> vagueness simply a product of my lack of understanding of the technical
> details of what is going on here.
> By the way, I don't consider this to be "OT" at all, as subjects that bear
> on the livelihoods of the computing technical community are subsumed by
> and all more specific technical discussions -- IMHO.
> Ken Meyer

More information about the spug-list mailing list