[JaxPM] wget, etc...

Nate Campi nate at campi.cc
Fri Aug 3 00:42:16 CDT 2001


On the jacksonville-pm-list; Jax.PM'er Nate Campi <nate at campi.cc> wrote -

Bill,

mod_rewrite your way around this:
(granted this is a painful way to do it, but I don't need pics, and I
could easily script in sh to follow links and get them with netcat, 
not to mention *gasp* perl)

[nate at monkey:nate]$ nc -v -v ora.sunhelp.org 80
ohno.mrbill.net [207.200.6.75] 80 (www) open
GET /index.html HTTP/1.1
Host: ora.sunhelp.org

HTTP/1.1 200 OK
Date: Fri, 03 Aug 2001 05:34:17 GMT
Server: Apache/1.3.20 (Unix) PHP/4.0.5
Last-Modified: Mon, 23 Jul 2001 04:10:53 GMT
ETag: "f3382-b9f-3b5ba3cd"
Accept-Ranges: bytes
Content-Length: 2975
Content-Type: text/html

<html>
<head><title>O'Reilly CD Bookshelf Library</title></head>
<body bgcolor="black" text="white" link="white" vlink="white">
<center>
<h3>This online reference is for private use only.</h3>
<hr><p>
<table width="75%" cellpadding="5" cellspacing="5">
<tr>
<td><a href="unix/upt/index.htm"><img
src="images/unixpowertools.jpg"></a></td>
<td><a href="unix/unixnut/index.htm"><img
src="images/unixnut.gif"></a></td>
<td><a href="unix/vi/index.htm"><img src="images/learnvi.jpg"></a></td>
<td><a href="unix/sedawk/index.htm"><img
src="images/sedawk.jpg"></a></td>
<td><a href="unix/ksh/index.htm"><img
src="images/learnkorn.jpg"></a></td>
<td><a href="unix/lrnunix/index.htm"><img
src="images/learnunix.jpg"></a></td>
</tr>
<tr>
<td><a href="networking/dnsbind/index.htm"><img
src="images/dnsbind.jpg"></a>
</td>
<td><a href="networking/tcpip/index.htm"><img
src="images/tcpip.jpg"></a></td>
<td><a href="networking/sendmail/index.htm"><img
src="images/sendmail.jpg"></a>
</td>
<td><a href="networking/smdref/index.htm"><img
src="images/senddesk.jpg"></a>
</td>
<td><a href="networking/firewall/index.htm"><img
src="images/firewalls.jpg">
</a></td>
<td><a href="networking/puis/index.htm"><img
src="images/security.jpg"></a>
</td>
</tr>
<tr>
<td><a href="perl/perlnut/index.htm"><img
src="images/perlnut.jpg"></a></td>
<td><a href="perl/learn/index.htm"><img
src="images/learnperl.jpg"></a></td>
<td><a href="perl/learn32/index.htm"><img
src="images/perlwin.gif"></a></td>
<td><a href="perl/prog/index.htm"><img
src="images/progperl.jpg"></a></td>
<td><a href="perl/advprog/index.htm"><img
src="images/advperl.gif"></a></td>
<td><a href="perl/cookbook/index.htm"><img
src="images/perlcook.jpg"></a></td>
</tr>
<tr>
<td><a href="webref/html/index.htm"><img
src="images/htmlguide.jpg"></a></td>
<td><a href="webref/cgi/index.htm"><img
src="images/cgiprog.jpg"></a></td>
<td><a href="webref/jscript/index.htm"><img
src="images/javascript.jpg"></a>
</td>
<td><a href="webref/perl/index.htm"><img
src="images/progperl.jpg"></a></td>
<td><a href="webref/webnut/index.htm"><img
src="images/webmaster.jpg"></a></td>
<td><a href="javaref/javanut/index.htm"><img
src="images/javanut.jpg"></a></td>
</tr>
<tr>
<td><a href="javaref/langref/index.htm"><img
src="images/javalang.jpg"></a></td>
<td><a href="javaref/awt/index.htm"><img
src="images/javaawt.jpg"></a></td>
<td><a href="javaref/fclass/index.htm"><img
src="images/javafund.jpg"></a></td>
<td><a href="javaref/exp/index.htm"><img
src="images/explorjava.jpg"></a></td>
<td><a href="oracle/prog2/index.htm"><img
src="images/oraplsql.gif"></a></td>
<td><a href="oracle/guide8i/index.htm"><img
src="images/ora8i.jpg"></a></td>
</tr>
<tr>
<td><a href="oracle/bipack/index.htm"><img
src="images/orabuilt.jpg"></a></td>
<td><a href="oracle/advprog/index.htm"><img
src="images/advora.jpg"></a></td>
<td><a href="oracle/webapp/index.htm"><img
src="images/oraweb.jpg"></a></td>
<td></td> 
<td></td>
</tr>
</table>
</center>
</body>
</html>
sent 48, rcvd 3214


On Fri, Aug 03, 2001 at 12:21:49AM -0400, JONES, WILLIAM C wrote:
> On the jacksonville-pm-list; Jax.PM'er "JONES, WILLIAM C" <wcjones at exchange.fccj.org> wrote -
> 
> Thx for reminding me about wget.
> 
> I've set mod_rewrite to disallow that bot...  I know I know - there are SO
> many others...
> 
> (Plus you could change the finger-print of wget by recompiling...)
> 
> 
> But, what I've done will stop a LOT of script kiddies...
> 
> Sx  :]
> 
> 
> PS:  The code, if interested -
> 
> <IfModule mod_rewrite.c>
>   RewriteEngine on
>   RewriteLog /var/log/mod_rewrite.log
>   RewriteLogLevel 0
>  
>   RewriteCond %{REQUEST_FILENAME} ^.+$
>   RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon            [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^EmailWolf              [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro           [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT          [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Crescent               [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^CherryPicker           [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^[Ww]get                [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit        [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.*       [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO              [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Telesoft               [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster          [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL          [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR]
>   RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
>   RewriteRule ^.*$ http://insecurity.org/nospam.html
> </IfModule>

-- 
	Nate

Jax.PM Moderator's Note:
This message was posted to the Jacksonville Perl Monger's Group listserv.
The group manager can be reached at -- owner-jacksonville-pm-list at pm.org
to whom send all praises, complaints, or comments...




More information about the Jacksonville-pm mailing list