APM: spamassassin learning

Fri May 19 08:11:27 PDT 2006

Keith Howanitz wrote:
> I have a couple of questions I have been meaning to ask since the lecture 
> on Spamassassin (It was great btw - thanks again.)
> 
> If I understood you correctly, you said you keep _all_ of your spam for 
> teaching the baysian filter....

Not just for bayes, for training and scoring as well which requires a
large corpus of messages.

> 
> 1) Is there a benefit to teaching a filter on items it marked as spam and 
> autolearned from?

SA will not re-learn a message, so once it has been marked and
autolearned there is no benefit to trying to learn it again.

> 
> 2) I both rewrite the header, and use the "report_safe" feature - would I 
> need to prefilter those things out of email before giving it to sa-learn? 
> If so - is there a tool to do so?
> 

As long as SA did the filtering and marking then when you attempt to
learn it should strip out the marking for you.  One thing to watch out
for are additional headers your system might add, look at the
bayes_ignore_header (or something like that) config option for details
on not learning on those.

> 3) On one mail system, I currently use spamassassin as a proxy filter 
> before mail goes to its final destination on another server. I have set 
> up a mail box for users to forward false negatives (and another for the 
> few false positives) and run sa-learn on those items once a week. Should I 
> be stripping headers that are added by their email clients before using 
> sa-learn on these messages?

You should have them forward as an attachment and then strip out the
attached message and learn from that.  Learning the raw forward can
cause issues.  Check out the SA wiki for more information in this area.

Michael

> 
> TIA,
> -Keith
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>