[Melbourne-pm] OT: Re: FW: Bamboozled by perl

Daniel Pittman daniel at rimspace.net
Sun Oct 4 22:01:13 PDT 2009


Sam Watkins <sam at nipl.net> writes:
> On Mon, Oct 05, 2009 at 11:52:25AM +1100, Toby Corkindale wrote:
>> Sam Watkins wrote:
>> >>text processing is where it really shines.
>> >
>> ># perl invocation to extract email addresses from text, 4 all ur spamming
>> >needs
>> >perl -ne 'print "$1\n" while 
>> >/(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)/ig'
>> 
>> Which fails to match some email addresses.
>> You may want to use these CPAN modules, which follow the appropriate RFC:
>
> It doesn't fail to match any email addresses that are actually used by
> anyone.

Sure it does.  Perhaps you meant to say:

   It doesn't fail to match any email addresses that I have seen in use,
   or that I have cared about when extracting this sort of information.


> The RFC-based regexps on email addresses are brain-damaged in the extreme,

...yeah, probably.  Regexp is a dreadful tool for trying to parse the syntax
specified in RCF-822 and descendants.

> no one uses comments inside emails and all that crap.

Sure they do.  I know several people who have done so precisely *because* it
reduces the number of tools that can recognise their address — it turns out
that spammers, like you, make all sorts of assumptions about what actually
happens...


> One should follow what is actually done, not the RFC.

Not a bad general policy, but...  with the but being that (A) you have been
given proof that this does actually happen, and (B) the RFC specified syntax
is a superset of the "practice" you are talking about.

> and he is not actually trying to match email addresses, is was just an
> example.

Ah.  The same problem I ran into earlier, with my faulty pseudo-code in Perl.

Incorrect advice isn't a good thing; people tend to follow it, then run into
trouble, and wonder why things are harder than they should be...

        Daniel
-- 
✣ Daniel Pittman            ✉ daniel at rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons
   Looking for work?  Love Perl?  In Melbourne, Australia?  We are hiring.


More information about the Melbourne-pm mailing list