[Boulder.pm] meeting/topic?

Walter Pienciak walter at frii.com
Tue Nov 12 11:37:29 CST 2002


Hi,

I enjoyed Rob's presentation on Extreme Programming last time.

I've been spending too much time thinking about spam and how to
deal with it.  Once upon a time, I was happy to have everything
in one mailbox, and I'd just look at the subject lines.

Then the volume of mail rose as everyone started to use it, and
I fired up procmail to sort into folders and to tag some of the
remaining subject lines for make visual ID easier.

Then the spam started.

procmail started getting unwieldy, so I rolled my own basic
pattern-matching filtering program using Mail::Audit.  But when
spammers can cost-effectively purchase a new domain for each spam
run, pattern-matching based on domain names becomes, uh, "less
effective."

So I move to a heuristics-based approach, with SpamAssassin.
Which works really well.  But there's still some spam that sneaks
in under the radar, so I look at that, add custom rules . . .
And still some sneaks in, more than I want, and I realize that if
*I* were a smart spammer, I'd have a copy of SpamAssassin myself,
and would tweak the wording on my e-mails so that it didn't rise
above the default spam threshold with the default settings.

Huh.

So I've been checking out the Bayesian approaches lately.
Interesting, and it made me get out the encyclopedia, since I
never did take none o' them statistic classes in school.

A LOT of these programs are written in Perl.  And so there's
the hook I need to make this on topic for the group.

Who would be interested in getting together for a meeting
centered around spam and the programs used to detect it?
Pros/cons of each, and if someone was feeling particularly academic
or informed about a program, they could give an intro to the
"interesting stuff" behind the mechanism -- e.g., Bayesian filtering.

Walter




More information about the Boulder-pm mailing list