[Melbourne-pm] OT: Re: FW: Bamboozled by perl
daniel at rimspace.net
Sun Oct 4 22:01:13 PDT 2009
Sam Watkins <sam at nipl.net> writes:
> On Mon, Oct 05, 2009 at 11:52:25AM +1100, Toby Corkindale wrote:
>> Sam Watkins wrote:
>> >>text processing is where it really shines.
>> ># perl invocation to extract email addresses from text, 4 all ur spamming
>> >perl -ne 'print "$1\n" while
>> Which fails to match some email addresses.
>> You may want to use these CPAN modules, which follow the appropriate RFC:
> It doesn't fail to match any email addresses that are actually used by
Sure it does. Perhaps you meant to say:
It doesn't fail to match any email addresses that I have seen in use,
or that I have cared about when extracting this sort of information.
> The RFC-based regexps on email addresses are brain-damaged in the extreme,
...yeah, probably. Regexp is a dreadful tool for trying to parse the syntax
specified in RCF-822 and descendants.
> no one uses comments inside emails and all that crap.
Sure they do. I know several people who have done so precisely *because* it
reduces the number of tools that can recognise their address — it turns out
that spammers, like you, make all sorts of assumptions about what actually
> One should follow what is actually done, not the RFC.
Not a bad general policy, but... with the but being that (A) you have been
given proof that this does actually happen, and (B) the RFC specified syntax
is a superset of the "practice" you are talking about.
> and he is not actually trying to match email addresses, is was just an
Ah. The same problem I ran into earlier, with my faulty pseudo-code in Perl.
Incorrect advice isn't a good thing; people tend to follow it, then run into
trouble, and wonder why things are harder than they should be...
✣ Daniel Pittman ✉ daniel at rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
Looking for work? Love Perl? In Melbourne, Australia? We are hiring.
More information about the Melbourne-pm