LPM: Regexps for email.

Joe Hourcle oneiros at dcr.net
Wed Jan 26 20:40:05 CST 2000



On Wed, 26 Jan 2000, Mike Andrews wrote:

> Not that this really answers the question, but I'm a little wary of using
> mailto: URL's *anywhere* anymore.  The instant you put an email address on
> a web page as-is is the same instant you get added to a pile of spammer's
> lists.  Most of them use web-crawling bots to harvest addresses.  We've
> got an amusing Apache mod_rewrite + Perl script combo that defeats most of
> them here...
> 
> Just something to think about.
> 
> http://www.turnstep.com/Spambot/avoidance.html has some interesting
> suggestions for getting around it -- keeping the email address out of the
> HTML source but still have the page look and work the same.  Javascript,
> creative use of tables, putting the address into a .gif, and so on...

One interesting approach I've seen is URI encoding e-mail addresses, as
web browsers will decode it, but it won't (unless they get smarter)
match the patterns for an e-mail address.

Eg:
	<A HREF="mailto:oneiros at dcr.net">oneiros at dcr.net</A>

won't match if it's encoded as:
	<A HREF="mailto:oneiros%40dcr.net">oneiros&#64;dcr.net</A>


(after all, if the spammers can send us crap that's URI encoded to make it
harder to track down, we might as well do the same to them)

-----
Joe Hourcle




More information about the Lexington-pm mailing list