SPUG: Chip Salzenberg Defense Fund

jlb jlb at io.com
Thu Aug 4 18:23:35 PDT 2005

The methods IRC servers and Mail servers use aren't appropriate for web 
sites because they introduce significant delay to each and every 
connection.  Even if the information were cached, the initial connection 
would be slow enough to drive many web users away.

Just speaking as someone who has run a website and encountered some of 
these issues:  Frequently "web bots" are very poorly behaved, issuing 
bursts of thousands of requests in a row, at frequent intervals.  This 
can often impact other legitimate users of the site, as well as 
potentially costing the site money in bandwidth and hosting.

There is a big difference between some person running a recursive wget on 
your site once to mirror it for their own personal use, and someone 
frequently and aggressively running screen scraping bots against it in an 
automated fashion.

If a web site has a robots.txt indicating they dont wish to be spidered, 
and have gone so far as to ban a single misbehaving bot multiple times, 
only to have those bans evaded...well, at what point does it become "bad?"

On Thu, 4 Aug 2005, Ken Meyer wrote:

> My desire to understand this has trumped my desire to conceal my lack of
> geeky sophistication.
> Is an "open proxy" used simply to evade an attempt by sites to specifically
> block this company's bots and not others?
> What about robots.txt; seems to me that this file implements no more than a
> "gentlemen's agreement", rather than a legal barrier, such as a password to
> access a computer on a network that is not intended for public access, such
> as a web server?
> Here is an excerpt from Wikipedia:
> "Because proxies are implicated in abuse, system administrators have
> developed a number of ways to refuse service to open proxies. IRC networks
> such as the Blitzed network automatically test client systems for known
> types of open proxy. [1] Likewise, a mail server may be configured to
> automatically test mail senders for open proxies, using software such as
> Michael Tokarev's proxycheck. [2]"
> So why have these techniques not been effective against the subject
> "scraping" in point (by the way, I thought that "scraping" referred to
> getting text off a screen shot that is in raster format, i.e. OCR, not
> actually snarfing the ASCII)?
> So, when is one hacking into a system and when is one simple accessing
> material that is exposed and fair game, whether that is desirable or not?
> What sort of material was this company harvesting?  Does it bear on privacy,
> which is a very tight subject in the case of medical information -- HIPAA
> philosophy is highly prevalent.
> Where are Mr. Salzenberg's computers now?  Are there contents intact?  Who
> has control of any files copied from them?
> It is unwise to address this problem via an organization called
> "geeksunite", which is certainly off-putting to the majority of the
> population, which if they are not actually repelled by the geek image, will
> presume that the subject will be beyond their comprehension.  If there are
> truly illegal acts going on, isn't a counterattack possible?  If civil
> liberties have been violated, certainly  the usual organizations will be
> alarmed and will provide support.  What about the ACLU and the EFF to defend
> Mr. Salzenberg?  I would rather support a well-known champion of the
> individual than directly to an individual who has not defined the problem or
> his approach to addressing it in other than vague terms -- or is the
> vagueness simply a product of my lack of understanding of the technical
> details of what is going on here.
> By the way, I don't consider this to be "OT" at all, as subjects that bear
> on the livelihoods of the computing technical community are subsumed by any
> and all more specific technical discussions -- IMHO.
> Ken Meyer
> -----Original Message-----
> From: spug-list-bounces at pm.org [mailto:spug-list-bounces at pm.org]
> On Behalf Of Bill Campbell
> Sent: Thursday, August 04, 2005 11:15 AM
> To: SPUG Members
> Subject: Re: SPUG: Chip Salzenberg Defense Fund
> On Thu, Aug 04, 2005, Ken Meyer wrote:
>> Thanks, Bill.  I sort of got the drift of that, but the question remains in
>> my mind: what are these "dubious practices".  Scraping what?  Can it be
>> downloading pages?  Highly unlikely to be a problem, as that is what the
> web
>> is about and magnitude of the process, while perhaps questionable, doesn't
>> appear to me to be illegal.  Burrowing into the web server to retrieve
>> information that is on the server but not displayed?  Well, kind of the
>> inverse of spyware, but spyware is endemic, albeit loathsome, and as far as
>> I can tell, not really illegal at this point.  It's still not clear whether
>> Chip has not been excessively sanctimonious about this practice, whatever
> it
>> is, not to mention naive about corporate prerogatives.
> I read the PDF of the letter that Chip wrote which claimed that the company
> was doing things like ignoring the ROBOTS.TXT files (and documentation in
> the code the company wrote referenced the appropriate documents about
> ROBOTS.TXT, so they can't claim ignorance), ignored complaints from
> webmasters about their activities -- including the Washington State
> Government sites, took steps to circumvent attempts to block their scraping,
> including use of open proxies, etc.  In the letter, Chip said that he had
> met with the corporate management, explaining the legal and ethical issues.
> Bill
> --
> INTERNET:   bill at Celestial.COM  Bill Campbell; Celestial Software LLC
> UUCP:               camco!bill  PO Box 820; 6641 E. Mercer Way
> FAX:            (206) 232-9186  Mercer Island, WA 98040-0820; (206) 236-1676
> URL: http://www.celestial.com/
> ``My reading of history convinces me that most bad government results
> from too much government.'' --Thomas Jefferson.
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
>     POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
>    MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
>    WEB PAGE: http://seattleperl.org/
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
>     POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
>    MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
>    WEB PAGE: http://seattleperl.org/

More information about the spug-list mailing list