Apache SpamAssassin

Apache SpamAssassin is an extensible email filter that is used to identify spam. Once identified, the mail can then be optionally tagged as spam for later filtering. It provides a command line tool to perform filtering, a client-server system to filter large volumes of mail, and Mail::SpamAssassin, a set of Perl modules allowing Apache SpamAssassin to be used in a wide variety of email systems.

Tags Software Development Libraries Perl Modules Communications Email Filters
Licenses Apache 2.0
Operating Systems OS Independent
Implementation Perl

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  12 Jun 2008 14:03
  • Rrelease-after

Changes: Newer gpg versions require keys to be cross-certified, so the sa-update public key was fixed accordingly. A perl version string was added to the storage area for compiled rulesets, to avoid crashes when perl is upgraded between major versions (e.g. perl 5.8.x to 5.10.0) and the ABI breaks. Some FORGED_MUA_OUTLOOK false positives were cleared on the new-format Message-ID generated by the Outlook Express version used in Windows XP service pack 3. Compatibility with Postgres 8.1.0 and later was fixed. Other miscellaneous fixes were done.

  • Rrelease-mid
  •  07 Jan 2008 11:59
  • Rrelease-after

Changes: Major sa-compile fixes. Minor fixes in other departments. 'score set for a non-existent rule' has been made a debug message, instead of a lint warning, since it's a very frequent FAQ.

  • Rrelease-mid
  •  09 Aug 2007 13:48
  • Rrelease-after

Changes: The new setuid code has been fixed to work with Perl 5.6.1 and to support DCC and Pyzor in all releases of Perl. The default 'user_scores_ldap_username' is now the null string, allowing anonymous binding. A 'schema' syntax error in LDAP config support has been fixed, along with an error where zeroing an 'eval' rule's score did not stop it from running. The new message ID format seen from Vista or Windows 2003 Server MAPI is now allowed to avoid false positives, and several issues with RDNS_DYNAMIC have been fixed.

  • Rrelease-mid
  •  25 Jul 2007 06:54
  • Rrelease-after

Changes: The "make test" rule was fixed when running as root; this is needed for CPAN. Certain mail input can take a long time to scan with 100% CPU utilization, due to backtracking in a rule's regexp. Sending a HUP signal to the spamd process causes the ps name to change from spamd to perl. "make test" errors in Windows caused by nonportable use of getpwuid were fixed. Multiple DNS records for a host name should allow use of spamd -H for load balancing installs to work. Network lookup timeouts were fixed, where lookups were being lost once a timeout was hit.

  • Rrelease-mid
  •  12 Jun 2007 23:34
  • Rrelease-after

Changes: A local user symlink-attack DoS vulnerability (CVE-2007-2873) was fixed. It only affects systems where spamd is run as root, is used with vpopmail or virtual users via the "-v"/"--vpopmail" or "--virtual-config-dir" switch, and with the "-x"/"--no-user-config, and without the "-u"/"--username" switch, and with the "-l"/"--allow-tell" switch. This is not default on any distribution package, and is not a common configuration. Other miscellaneous bugs were also fixed.

Rss Recent comments

Rcomment-before 21 Jan 2004 03:07 Rcomment-trans crippler Rcomment-after

some of this discussion is outdated
SpamAssassin has come a long way since this discussion started. The concept of whitelisting & blacklisting messages has gotten a whole lot easier.

Now that SA has Bayesian filters, training your SA with a large corpus of mail can be pretty easy (though necessarily tedious). I've set up two folders; one called "Ham" and the other called "Spam". Go through and move messages in your mailbox to one of those two folders. Good mail goes to Ham, junkmail goes to Spam. The larger the corpus of mail you pull from the better.

Next I set up a couple of cron jobs that look like:

20 2 * * * /usr/bin/sa-learn --ham --mbox ~/Ham
20 4 * * * /usr/bin/sa-learn --spam --mbox ~/Spam

Once a day, SpamAssassin goes through my Ham & Spam folders and learns what good & bad mail tend to look like. The more I feed it, the better it gets at catching it.

Some types of spam were still getting through despite this filtering. The scores were significant but below the minimum score I had set to mark a message as spam. Many of these were Nigerian scam mails. Here are some lines I added to my global SpamAssassin config to take a big chunk out of incoming Spam:

# New blacklist not included in
# default configuration
header RCVD_IN_BNBL eval:check_rbl('bl', 'bl.blueshore.net.')
describe RCVD_IN_BNBL Listed by BNBL
tflags RCVD_IN_BNBL net
score RCVD_IN_BNBL 2
# Higher scoring for Nigerian scams
score NIGERIAN_BODY1 3
score NIGERIAN_BODY2 2
# Known high-volume spammers that
# I have no interest in hearing from.
body PHARMAWHAREHOUSE /pharmawharehouse.biz/
describe PHARMAWHAREHOUSE Link to pharmawharehouse.biz
body PHARMACOURT /pharmacourt.biz/
describe PHARMACOURT Link to pharmacourt.biz
body VALUEPOINTMEDS /valuepointmeds.biz/
describe VALUEPOINTMEDS Link to valuepointmeds.biz
score PHARMAWHAREHOUSE 10
score PHARMACOURT 10
score VALUEPOINTMEDS 10

Rcomment-before 08 May 2003 03:22 Rcomment-trans weissel Rcomment-after

Re: Almost Amazing!

> > Won't work well --- if at all --- for
> >
> > * Mailing lists
> > * automated mailings (freshmeat's new version mailings, most
> > buying over the internet stuff, Bounces, etc.)
%
> Legitimate mailing lists and automated mailings are usually
> easy to differentiate from spam;

I got a 'please go to this website' (where you have to enter a
20 char long string to let the message pass through) ... which
looked so much like the spam I usually get that the spam filter
treated as spam.

In the end I had to re-write the message, before it passed
through, as it timed out the first time before I looked through
the spam heap. I would not have done this if the email had not
been important _for me_ to arrive. Helping others is _not_ that
important, as I do this on my free time.

If I had countered with a confirmation request instead of
throwing it on the spam heap, I'd never known that my mail never
made it. Instead I would have grumbled over the recipient's
silence.

Easy to differentiate, indeed.

> also, if you know ahead of time that you are subscribing to
> something, you can add it to a whitelist.

So I just got a mail from a guy 'noreply@freshmeat.net' which
notified me of your answer. Never got that mail before. So how
can I whitelist that in advance? How is noreply@freshmeat.net
gonna read, much less respond to a confirmation request?

How is _that_ low maintenance?

(The same goes, as I said, for many online shopping cases.)

> > * people who don't like jumping through hoops to get mail
> > through (unfortunately these are usually the people who
> > give answers).
%
> First, you can safely whitelist everybody you send to, so as
> not to inconvenience them.

i.e. even more work for me to integrate that into my mail client.

And if they answer me from a different (e.g. preferred or new)
address, they'll be inconvenienced again --- when all they try to
do is making me reach them better/faster.

This can be real fun if you use sneakemail.com (I do).
If you send me a mail to my sneakemail address (say
xxxxx@sneakemail.com), I get a temporary yyyyy@sneakemail.com
(which will expire in a few days).

You send me another mail in a week ... and I'll get a
yzyzyz@sneakemail.com. A new confirmation is clearly neccessary,
right? So you'll have to parse the X-Sneakemail-From: header
instead of just the From header, where it applies.

> Also, if you apply this, say, only to messages tagged by
> spamassassin as 'probable spam', only your friends trying to
> sell you penis enlargements will be asked to confirm :-)

So we are still stuck on the case --- which I, personally,
experienced --- where a confirm mail will be asked to confirm
itself. At best, you'll never ever see that mail. Really a good
thing if the mail was somewhat important.

> > * Senders where the anti-spam system fires such a message
> > right back to you --- you can get a nice mail flood if that
> > goes over a mailing list. For 3 parties you'll get a very
> > very impressive snowball effect! (Can you say 'complete
> > meltdown'?)
%
> Oh, come on now. Sending one message per address is a simple
> thing to do.

You are implying a world where nobody's 'out of office' mails
will be send as answer to their own 'out of office' mails.

Welcome to reality.

I have seen that at 100 mails/hour on a mailing list. More than
once. So much that the mailing list finally stopped Reply-To
munging. It won't help, either, if the sender address keeps
changing. Like some peope who regularly change their mail
addresses to avoid spam.

> To see two systems that are successful with the confirmation
> technique, read up on these: TMDA and Active Spam Killer.
> Remember that you can combine this with a spam identifier like
> spamassassin to only request confirmation from messages that
> look like spam.

So you'll be part of a DDoS on some poor schmuck who's address
was faked into the mail.

If but 0.5% of the recipients of a modest 5 mio. spam use such
a thing, you'll have 25k mails on you on the day your address
appears in the From of a spam. And often enough it is somebody's
spam. Ask the owners of test.com. With luck, you'll fire off
another 25k mails if the confirmation request includes the
original spam "for your convenience".

And now imagine 1% and 20 million recipients. 200k mails is fun
and a half.

Again, it's your choice, I believe that these things can
harm others, badly, and thus should not be used without deep
understanding. But go right ahead, time will show if DDoSsing
innocent bystanders will help the fight against spam.

Rcomment-before 07 May 2003 12:15 Rcomment-trans markthomas Rcomment-after

Re: Almost Amazing!

> % How about this: set up an
> autoresponder
> % that says, "I'm sorry, your message
> has
> % been trapped by my spam filter. If
> this
> % is a legitimate email message, please
> % put the word PASSWORD in the subject.
> %[...]
> %
> % I guarantee that spammers are not
> going
> % to bother putting your password in
> the
> % subject.
>
>
> Won't work well --- if at all --- for
> * Mailing lists
> * automated mailings (freshmeat's new
> version mailings, most buying over the
> internet stuff, Bounces, etc.)

Legitimate mailing lists and automated mailings are usually easy to differentiate from spam; also, if you know ahead of time that you are subscribing to something, you can add it to a whitelist.

> * people who don't like jumping through
> hoops to get mail through (unfortunately
> these are usually the people who give
> answers).

First, you can safely whitelist everybody you send to, so as not to inconvenience them.
Also, if you apply this, say, only to messages tagged by spamassassin as 'probable spam', only your friends trying to sell you penis enlargements will be asked to confirm :-)

> * Senders where the anti-spam system
> fires such a message right back to you
> --- you can get a nice mail flood if
> that goes over a mailing list. For 3
> parties you'll get a very very
> impressive snowball effect! (Can you
> say 'complete meltdown'?)

Oh, come on now. Sending one message per address is a simple thing to do.

> * If a mailing list rewrites the header
> enough (reply-to munging comes to mind)
> you could even start answering your own
> "put PASSWORD in subject line" for all
> the mailing list to see. Fun (and that
> has happened with vacation mails before,
> at 100 mails/h)!
>
> You will have to decide yourself if
> these restrictions and dangers are
> acceptable to you, your mailing list
> reputation and your environment; you
> also have to think about how to avoid
> vicious circles as outlines above.
> Dropping mails you'll always risk
> dropping information, if that risk is
> acceptable to you, go ahead.

To see two systems that are successful with the confirmation technique, read up on these: TMDA (http://www.tmda.net/) and Active Spam Killer (http://sourceforge.net/projects/a-s-k). Remember that you can combine this with a spam identifier like spamassassin to only request confirmation from messages that look like spam.

Rcomment-before 20 Mar 2003 06:49 Rcomment-trans weissel Rcomment-after

Re: Almost Amazing!

%
> % % How about this: set up an autoresponder
> % % that says, "I'm sorry, your message has
> % % been trapped by my spam filter. If this
> % % is a legitimate email message, please
> % % put the word PASSWORD in the subject.
> % % [...]
%
> % % I guarantee that spammers are not going
> % % to bother putting your password in the
> % % subject.
%
%
[shortened]
> % Won't work well --- if at all --- for
> % * Mailing lists
> % * automated mailings
> % * people who don't like jumping through
> % hoops
> % * Senders where the anti-spam system
> % fires such a message right back to you
> % * If a mailing list rewrites the header
> % enough
[leading to endless mail loops and other fun things]
%
> % You will have to decide yourself if
> % these restrictions and dangers are
> % acceptable to you, your mailing list
> % reputation and your environment;
[...]
%
> Most of what you are asking for can be resolved
> using the user_prefs file. You can find
> a free Windows utility for creating and
> editing user_prefs files here:
%
> http://www.CleanMyMailbox.com/sa

As a non-Windows-User I cannot use that program (not that I'd need it).

Also, there is no way the user_prefs file can prevent the problems outlined above if you use an autoresponder telling people to put something specific into the subject.

Rcomment-before 19 Mar 2003 22:21 Rcomment-trans jhalbrook Rcomment-after

Re: Almost Amazing!

>
> % How about this: set up an
> autoresponder
> % that says, "I'm sorry, your message
> has
> % been trapped by my spam filter. If
> this
> % is a legitimate email message, please
> % put the word PASSWORD in the subject.
> %[...]
> %
> % I guarantee that spammers are not
> going
> % to bother putting your password in
> the
> % subject.
>
>
> Won't work well --- if at all --- for
> * Mailing lists
> * automated mailings (freshmeat's new
> version mailings, most buying over the
> internet stuff, Bounces, etc.)
> * people who don't like jumping through
> hoops to get mail through (unfortunately
> these are usually the people who give
> answers).
> * Senders where the anti-spam system
> fires such a message right back to you
> --- you can get a nice mail flood if
> that goes over a mailing list. For 3
> parties you'll get a very very
> impressive snowball effect! (Can you
> say 'complete meltdown'?)
> * If a mailing list rewrites the header
> enough (reply-to munging comes to mind)
> you could even start answering your own
> "put PASSWORD in subject line" for all
> the mailing list to see. Fun (and that
> has happened with vacation mails before,
> at 100 mails/h)!
>
> You will have to decide yourself if
> these restrictions and dangers are
> acceptable to you, your mailing list
> reputation and your environment; you
> also have to think about how to avoid
> vicious circles as outlines above.
> Dropping mails you'll always risk
> dropping information, if that risk is
> acceptable to you, go ahead.

Most of what you are asking for can be resolved

using the user_prefs file. You can find a free
Windows utility for creating and editing user_prefs files here:

http://www.CleanMyMailbox.com/sa

4a67f3a0e029a9881d135c2bec24cbcc_thumb

Project Spotlight

Tasktop

Enterprise-ready productivity software built on Eclipse Mylyn.

000bd42243d03259200e9b756ff58bcd_thumb

Project Spotlight

phpMyAdmin

A tool that handles the basic administration of MySQL over the Web.