 |
SpamAssassin vs. Spastic
by Keith Winston, in Editorials - Sat, May 24th 2003 00:00 PDT
SpamAssassin has emerged as the most popular antispam tool in the
Open Source world. It has gained such momentum that it has even
crossed over into the commercial world as SpamKiller by Network
Associates, and other commercial products are also based on it. This
article is a short comparison of real world results between two
antispam tools, SpamAssassin and Spastic.
Copyright notice: All reader-contributed material on freshmeat.net
is the property and responsibility of its author; for reprint rights, please contact the author
directly.
Disclaimer: I am the current project leader and main developer for
Spastic.
Types of antispam programs
Without getting into all the intricacies of email RFCs, I should
mention that spam can be fought in many places throughout the
system. Most mail servers, or Mail Transfer Agents (MTAs), have some
antispam capabilities, but most users don't have the ability or desire
to run their own mail servers. The Mail Delivery Agents (MDAs) are
programs that take mail from an MTA and deliver it to local
mailboxes. procmail is a very
popular MDA and is the means by which both SpamAssassin
and Spastic are
usually invoked. Finally, many mail clients, or Mail User Agents
(MUAs), have some antispam capabilities. One promising new trend is
Bayesian filtering, which is built into the latest version of the Mozilla mail client
(among others). However, this article is focused on two tools which
filter at the MDA level using procmail.
Overview of SpamAssassin
SpamAssassin is a collection of Perl modules which test elements of an
email message and assign a numeric ranking to it. The higher the
ranking, the more likely that the message is spam. The default
settings define a spam message as anything with a score of 5.0 or
higher. SpamAssassin also checks Realtime Blackhole Lists and has many
other advanced features. It is usually called through procmail,
although newer versions come with a powerful spamd/spamc client-server
interface as well.
Overview of Spastic
About two years ago, the level of spam I began to receive crossed my
pain threshold, and I was motivated to take control of the problem. I
tried several Open Source spam solutions, including SpamAssassin. At
the time, the numeric ranking method of determining spam by
SpamAssassin seemed counterintuitive. How do you know how to
effectively weigh each setting? In time, I stumbled across SPAST,
which was a relatively simple-to-understand procmail script which used
word lists to match against elements of an incoming message. It was
simple to set up, understand, and customize. The problem was that
SPAST was no longer supported by its author, Chrissie LeMaire. I
tracked Chrissie down and asked her permission to take over the SPAST
project and develop it. Thus, Spastic was born.
Spastic uses procmail and common system utilities like formail, dig,
and egrep to scan elements of an email message for patterns, check for
valid domains and address formats, etc. One big difference between
Spastic and SpamAssassin is that Spastic rules are binary. When a
Spastic rule fires, the message is flagged as spam. If a message
passes all the tests, it is not flagged. There is no ranking
system. The Spastic distribution also includes bash scripts for
reporting statistics and rotating spam archives.
Testing Method
The way I tested each program was to set it up to filter all incoming
email for a seven day period and log the success rate of each. I made
no configuration changes or tweaks to either program during the
test. The main configuration I did for SpamAssassin was setting up my
whitelist and a couple of cosmetic settings. Since I am on several
mailing lists, I receive about 300 messages a day. In this mix is
usually a small number of spam messages which come from a variety of
sources. I usually receive about 10-20 spam messages a week, which I
consider low by most standards today.
I tested SpamAssassin from April 14-20, 2003 and Spastic from April
21-27, 2003.
While my test results are accurate for the email I typically receive,
I can't generalize my results to other email users. Please keep in
mind that your results may vary.
Test Results
| SpamAssassin |
Spastic |
| Correctly stopped 16 spam messages. |
Correctly stopped 10 spam messages. |
| 1 false positive. |
0 false positives. |
| 1 missed spam message. |
2 missed spam messages. |
| Total messages processed outside of whitelists: 51 |
Total messages processed outside of whitelists: 49 |
| 2 out of 69 incorrect = 67/69 = 97.10% correct |
2 out of incorrect = 63/65 = 96.92% correct |
Unfortunately, I realized too late that I should have saved the
messages with which each program made an error and cross-tested them
against the other one to see if it would have done better. I made a
note of it for the next time I run a comparison test.
The results were very close, with SpamAssassin ending with a slightly
higher percentage for correctly processing messages. If you are more
concerned with false positives, Spastic came out slightly ahead, since
many people would rather see a spam message slip through their filter
than take a chance on losing an important message. Keep in mind that
these sample sets are very small, so drawing firm conclusions is
difficult.
Strengths and Weaknesses
After using the programs back-to-back, I have some observations about the
strengths and weaknesses of each.
SpamAssassin strengths
- A nice spamd/spamc interface, efficient, easy to use.
- This feature is intended to make the program easier to use and improve
performance. It is one of my favorite features.
- It's easy to customize the whitelist and add other rules in the
~/.spamassassinrc/user_prefs file.
- By adding or modifying rules in your
personal user_prefs, you can customize the behavior and weightings if you
don't like the defaults.
- More tests, more generalized, more accurate.
- It is much more sophisticated in testing elements for spam qualities than
Spastic, and a better generalized solution for filtering an entire site.
- Very easy to implement under Red Hat 9 by selecting it during
installation.
- This makes installation in Red Hat 9 drop dead easy.
- A large community supporting and testing it.
- A more detailed report of spam triggers.
- SpamAssassin provides a detailed report of each spam rule that adds up to
the final ranking for the message.
- Support for other antispam tools like Vipul's
Razor and RBLs.
SpamAssassin weaknesses
- Depends on Perl.
- Since SpamAssassin is written in Perl, it requires a recent version of Perl
to be installed on the local machine. It depends on many modules and
Perl packages, and may be effected if Perl is upgraded on the machine.
- You may not be able to use it if you do not have rights to install
Perl.
- If you don't have rights to install Perl on the target machine, you can't
use SpamAssassin. In most cases, this is not an issue, since Perl is
installed on the majority of *nix machines.
- The default setting mangles messages flagged as spam (by
changing MIME types).
- I hesitate to mention this as a drawback because the default setting is
this way to protect users from Web bugs and malicious HTML content like
Javascript. When SpamAssassin flags a message as spam, it changes the MIME
type of all attachments to text so they are no longer executable. However, if
the mail was a false positive, it may be difficult to recover the original
message format if it was base64 encoded or was a multi-part MIME message.
This default can be changed by setting the "defang_mime" option to 0.
Spastic strengths
- Very easy to implement on any Linux distribution, easy on most
*nixes.
- In most cases, you can download the 60k tar.gz file, unzip it,
run the setup script, and be ready to filter spam in about 5-10
minutes.
- Depends on common system utilities (procmail, grep, and dig).
- Since Spastic uses procmail and common system utilities, it is unlikely that
additional software installation or configuration will be required to run
it. Unless it is used as a site filter, root access is not required. It may
be the best choice to use on a hosted server if Perl/SpamAssassin is not
available.
- It's easy to customize the whitelist and change rules and filter
lists.
- Customizing the whitelists and rules is a simple matter of editing a few
text files.
- A rotate-spam script to archive spam folders and produce
statistical reports.
- Spastic includes an optional bash script which can be run from cron to rotate
the spam mailbox and keep up to nine archives. It also summarizes the reasons
that messages were flagged and provides totals so you can see who is sending
you the most spam. Note: with a few small tweaks, I was able to use the
rotate-spam script with SpamAssassin to provide similar functions.
- Basic antivirus recipes.
- Spastic can flag any message carrying executable content to prevent it from
reaching a vulnerable Windows box and causing damage.
Spastic Weaknesses
- Not as accurate as SpamAssassin.
- SpamAssassin does more tests and is more thorough. The default weights
(determined by a genetic algorithm, no less) in SpamAssassin are very good
and proved to be slightly more accurate in my testing. For a sitewide
antispam solution, I have no doubt that SpamAssassin is more accurate than
Spastic. For individuals who tune their filter files to the email they
receive, Spastic and SpamAssassin are about equally effective.
- Small community supporting and testing it.
- Since SpamAssassin has a much larger community, it is better tested and
supported.
Conclusion
SpamAssassin is the king of spam filtering for a reason. It is very
sophisticated, well designed, and effective. For a sitewide filtering
solution, I would strongly recommend SpamAssassin over Spastic. If you
can't use SpamAssassin on a particular box (like a hosted box), or if
you want a simpler solution for a small number of users, Spastic will
also serve you well.
If you want to explore further, here are two other interesting
antispam tools:
Editor's Note
This is just the tip of the growing iceberg of antispam tools in
circulation today. I've been very happy with SpamAssassin for the
last year or so. What are you using? What's your experience been
with it? What's still slipping through? Where do you think the spam
war is headed?
Author's bio:
Keith Winston would like
to hear about all the latest Nigerian breast enlargement techniques at
slippery@users.sourceforge.net.
T-Shirts and Fame!
We're eager to find people interested in writing articles on
software-related topics. We're flexible on length, style, and
topic, so long as you know what you're talking about and back up
your opinions with facts. Anyone who writes an article gets a
t-shirt from ThinkGeek
in addition to 15 minutes of fame. If you think you'd like to try
your hand at it, let jeff.covey@freshmeat.net
know what you'd like to write about.
[Comments are disabled]
Comments
[»]
17 spam in 6 days?
by Serge Knystautas - Aug 25th 2003 21:48:34
The sample size is so minimal, the tests are pretty much meaningless. But
more importantly, you're getting 3 spam a day and you care about spam
email?
[reply]
[top]
[»]
SpamAssassin vs. SPASTIC vs. Bayesian in another posting
by era - Aug 24th 2003 23:30:38
You'll notice that a more proper test is now at
http://freshmeat.net/articles/view/964/
[reply]
[top]
[»]
Why is Perl a drawback but Procmail isn't?
by dozer - Jul 11th 2003 21:44:33
Pretty much every Unix system on the planet now has Perl installed. This
is certainly not true of Procmail. So explain, please, why you consider
that being implemented in Perl is a drawback? Performance? No.
Availability? No. Compatibility? Maybe, but not with SpamAssassin. I
don't understand.
Now, in my experience, a Procmail implementation is certainly a drawback.
Procmail is a tool that peaked in 1998. Now we have easier to use, more
capable and, most importantly, more secure solutions (Sieve + Amavis is
one notable one). It's time to put Procmail's security holes and awful
syntax to pasture.
[reply]
[top]
[»]
A four letter word
by David Collantes - Jun 27th 2003 05:21:28
TMDA, actually, the best Spam reducer tool. Clean,
professional, accurate.
[reply]
[top]
[»]
Re: A four letter word
by Macdaddy - Aug 11th 2003 14:23:02
> TMDA, actually, the best Spam reducer
> tool. Clean, professional, accurate.
I too am thinking of a 4-letter word that describes TMDA. Unfortunately
it's not "TMDA."
[reply]
[top]
[»]
You don't need root to install Perl
by era - Jun 16th 2003 23:51:40
In addition to the comments about the lack of statistical validity for such
a small sample (come on, it's not hard to get samples of spam, thousands
of'em!) I'd like to remark that it is by no means impossible (though also
not necessarily very straightforward) to install Perl for your own use,
without any administrator privileges. I believe the Perl installer offers
this as an option (but of course I haven't compiled Perl myself in eons
... apt-get rules :-).
[reply]
[top]
[»]
Plain bogofilter a simple effective alternative
by Mat Farrington - Jun 13th 2003 01:57:50
Rather than upgrade spamassassin again, I replaced it with bogofilter
alone.
After a few thousand training emails it now outperforms that version of
spamassassin (which admittedly was ageing). I expect performance to
improve further with ongoing training.
Local email aliases allow users on my system to maintain personal
bogofilter databases.
I appreciate that recent versions of spamassassin have bayesian learning
and that bogofilter can be trained using spamassassin output, but see
little reason to complicate an already-effective and elegant solution.
[reply]
[top]
[»]
Ask for another solution
by I. B. Turner - Jun 11th 2003 23:13:06
I've used ask (Active Spam
Killer) for exactly year. It has let through 2 spams in that time,
while I get 15-30 per day.
Advantages: people that you care about can easily get through. Very small
percentage of spams get through. White list is easy to set-up.
Disadvantages: takes too much disk space in it's queue-it saves the spam
for too long by default and should probably compress/decompress it.
Seeming disadvantage that I haven't encountered: it relies on replies
which should make spammers think your address is real and increase the
amount of spam you get. I haven't found this to be the case.
All that said I'm happy with it.
[reply]
[top]
[»]
Scientific method anyone?
by Marcelo E. Magallon - Jun 7th 2003 10:57:09
Catchy title. I mean, basically "SpamAssassin vs Anything" is catchy
:-)
The following made some alarms trigger:
I tested SpamAssassin from April 14-20, 2003 and Spastic from April 21-27,
2003.
bzzzzzt! Wrong! You don't compare spam filtering tools like that. You
compare them with the same corpus, otherwise the comparison is worthless.
Since SpamAssassin is quoted as producing one false positive, it would
have been nice to see the message that did that and the reason why
SpamAssassin tought it was spam. The version of SpamAssassin is also
missing, which complicates things further in the reproduceability
department. The later Perl-bashing is also not welcomed. If you don't
like Perl for whatever reason, please write a disclaimer at the top of
your article ("I have a bias against Perl, nevertheless I'm going to write
a comparison that involves a Perl program") so that readers can know what
to expect. In particular, SpamAssassin is not the kind of thing people
would want to install on their own. Ask your system administrator to set
it up on the machine you use to receive email. Any competent system
administrator will setup spamd and that won't make it necessary to run N
copies on the machine in question, which I'm sure has better things to do
with the CPU (like, uhm, receiving email). Regarding the perceived
advantages of Spastic, rotating email in a folder is a no-brainer given
that you have grep-mail handy. The same goes for summarizing the reasons
why mail got flagged as spam or non-spam. The bit about executables can
be done with a system wide procmail setup and it really doesn't have
anything to do with classifying spam.
[reply]
[top]
[»]
SpamAssassin
by hnoesekabel - Jun 4th 2003 04:26:00
My email adress at work is protected by SpamAssassin, and it actually does
a pretty good job on cutting down spam. As for the false positives:
SpamAssassin only detects and tags spam, whatever you do with these
message is up to you. You can delete messages with a score of, say, 20+
automatically and store the rest in a folder. That way, you cut your
'losses'.
As for the installation of SpamAssassin: install it with CPAN. Quick and
simple. I could do it ;)
Right now, I keep the highest scoring spam mails for fun. Current top
score is 60 (with the default scores).
[reply]
[top]
[»]
complexity is good
by Florin Andrei - Jun 2nd 2003 16:45:16
Spam is a complex thing. Sometimes even humans have issues identifying it
as such - i've heard people saying they actually got good deals from spam.
So no wonder it's extremely hard for a computer to tell spam from ham.
Complex problems require complex solutions. Therefore, don't expect a
simple one-off solution to be good at catching spam.
Take a look at this article:
Fairly-Secure Anti-SPAM
Gateway Using OpenBSD, Postfix, Amavisd-new, SpamAssassin, Razor and
DCC
It describes a method to combine SpamAssassin with other anti-spam
techniques (Vipul's Razor, DCC) and with anti-virus stuff to better handle
bad e-mail. Worth reading!
[reply]
[top]
[»]
some other approach
by karellen - May 24th 2003 23:18:19
I rather like the other approach. Using blacklists to waste spammer's time
on a phony mail transport agent and drive the cost of spam skyhigh. I
think the OpenBSD community did something in this direction. This combined
with some kind of bayesian filtering that kills the spam *before* it
reaches the MTA (or built into the MTA via some kind of hook that calls an
external program). Nobody likes to queue spam. My system is not a spammer
trash can.
[reply]
[top]
[»]
Re: some other approach
by cloudmaster - May 27th 2003 14:54:56
> I rather like the other approach. Using
> blacklists to waste spammer's time on a
> phony mail transport agent and drive the
> cost of spam skyhigh. I think the
> OpenBSD community did something in this
> direction. This combined with some kind
> of bayesian filtering that kills the
> spam *before* it reaches the MTA (or
> built into the MTA via some kind of hook
> that calls an external program). Nobody
> likes to queue spam. My system is not a
> spammer trash can.
Here's a couple of commands that might help you:
(echo -n '|'; `which procmail`) > ~/.forward
(echo ':0';echo '* ^X-Spam-Status: Yes';echo '/dev/null';echo) >>
~/.procmailrc
Then, assuming your mail's going through spamassassin, the message is
instantly deleted. You don't waste any space on it.
Seriously, a lot of spam makes it past the checks that an MTA can make on
the conenction info (RBLs, validity checks), so you need *something* to
check the body and client-supplied headers for bad stuff. That has to be
done after the message is received because of the way SMTP works. If you
don't want to waste the space, you can use a mazimum message size limit in
combination with a spam checker (I use SpamAssassin) and something that
throws away messages marked as spam.
I've been using SA for a few months now - it tags around 100 messages
daily (between the few accounts that I use). I've had 0 false positives.
I've got some basic system-wide rules set up, and per-user whitelists
stored in a database, managed with a simple PHP form that even our dullest
users can handle. It's received only praise for letting users filter out
their spam. My only advice is to look over the config settings, and
change some of the defaults - the default setup is not ideal for the
average user, IMHO.
-- -----------------------------
Light in the absence of eyes
illuminates nothing
[reply]
[top]
[»]
Re: some other approach
by dystopia - Jun 18th 2003 12:31:19
> I rather like the other approach. Using
> blacklists to waste spammer's time on a
> phony mail transport agent and drive the
> cost of spam skyhigh. I think the
> OpenBSD community did something in this
> direction. This combined with some kind
> of bayesian filtering that kills the
> spam *before* it reaches the MTA (or
> built into the MTA via some kind of hook
> that calls an external program). Nobody
> likes to queue spam. My system is not a
> spammer trash can.
OpenBSD uses 'spamd' which uses a combination of
'spews' a 'fake MTA which uses high tarpitting
settings' in conjunction with it's PF (Packet Filter).
Read (not Reed) more about it at:
http://www.benzedrine.cx/relaydb.html
[reply]
[top]
[»]
SpamAssassin weakness IMHO
by Gilgongo - May 24th 2003 16:02:56
I've been using SA for about 18 months on a mail server with 15 people on
it getting personal mail, and have trialled it on a server for users
getting business mail.
With any anti-spam system, false positives are a problem. This is
compounded by the fact that very often, one man's spam is another man's
legitimate communication. When I trialled SA with 10 users in our company,
with a default score of 10 (which I thought was quite high) I spent about 4
hours in the first three weeks having to tweek SA scores and populate
whitelists for these users. I was completely unprepared for the amount of
stuff they were getting that they regarded as legit, which I myself would
simply have binned if it arrived in my inbox. This meant that if I'd
rolled it out to the remaining 90 users in the company, I'd be spending a
hell of a lot of time maintaining SA rules.
Now, I know that SA has Bayesian filters that can be trained per user, but
this isn't practical when all our users are non-technical and their access
to the mail server is purely via POP3 with Outlook 2000.
I would therefore say that a weakness of SA is that relies too heavily on
system-wide rules that in turn produce too many false positives.
I have since looked at DSPAM, which is a purely Bayesian filter with no
system-wide rules at all. The trouble is that it won't work with our new
mail server config, which is running Mailscanner as a proxy to MS Exchange
(don't ask, don't ask).
-- Gone are the days when you could say "Those were the days."
[reply]
[top]
[»]
Re: SpamAssassin weakness IMHO
by Macdaddy - May 29th 2003 16:39:46
> I've been using SA for about 18 months
> on a mail server with 15 people on it
> getting personal mail, and have trialled
> it on a server for users getting
> business mail.
>
> With any anti-spam system, false
> positives are a problem. This is
> compounded by the fact that very often,
> one man's spam is another man's
> legitimate communication. When I
> trialled SA with 10 users in our
> company, with a default score of 10
> (which I thought was quite high) I spent
> about 4 hours in the first three weeks
> having to tweek SA scores and populate
> whitelists for these users. I was
> completely unprepared for the amount of
> stuff they were getting that they
> regarded as legit, which I myself would
> simply have binned if it arrived in my
> inbox. This meant that if I'd rolled it
> out to the remaining 90 users in the
> company, I'd be spending a hell of a lot
> of time maintaining SA rules.
I rolled out SpamAssassin on a 3000 user production system. I'm still
waiting to here a single valid complaint on its accuracy. I'm no longer
maintaining that system however. It's now running a dated copy of SA. My
primary mail account is still there though. The amount of spam getting
through that old copy of SA is steadily increasing. What people don't
realize is the SA has to be kept up to date. No ifs ands or buts about
it. YOU HAVE TO KEEP IT UP TO DATE. You can't be a lazy admin
that only works on something when it's broken. A 2 year old FTP daemon
will still work fine as long as no security holes have been found. A 3
month old copy of SA is out of date and must be upgraded. No excuses.
Spammers are specifically targetting the negative scoring rules in older
copies of SA to lower the overall score of their spam. It's a no brainer.
Update your copy of SA or don't bitch and moan when it stops working as
expected.
[reply]
[top]
[»]
Re: SpamAssassin weakness IMHO
by A'rpi/ESP-team - Jun 7th 2003 05:34:15
I'm running SA 2.53 on a production server in a school (including teachers,
administration, students), for ~1500 users. The flag limit is left at score
5.0, but it deletes mails with 10.0+ points. At the first weeks I got a few
complains (actually i've asked users to do so) of false positives and
negatives. By manually tuning some SA rules and bayesian filter I got rid
of them. Since that, i didn't get a single complaint.
Looking at statistics, there is around 1600 spam with 10+ score and around
150 with score 5..10, weekly.
So this filtering reduced delivered spam level to 10%, and still flag
spam-looking mails to easier separation by users.
A'rpi
[reply]
[top]
[»]
I can only read this article if I'm not logged in to freshmeat.
by riddley - May 24th 2003 12:04:08
subject sez all
[reply]
[top]
[»]
Re: I can only read this article if I'm not logged in to freshmeat.
by jeff covey - May 25th 2003 12:33:12
If you believe you've found a bug in freshmeat's code, you should
report it at http://freshmeat.net/contact/.
Thanks.
-- vs lbh pna ernq guvf, lbh'er n trrx.
[reply]
[top]
[»]
Re: I can only read this article if I'm not logged in to freshmeat.
by riddley - Jun 5th 2003 09:54:08
>
>
> If you believe you've found a bug in
> freshmeat's code, you should
> report it at
> http://freshmeat.net/contact/.
>
>
>
> Thanks.
>
I usually only submit bug reports to Open Source projects...
[reply]
[top]
[»]
Pros and cons
by Gustavo Muslera - May 24th 2003 10:11:01
Is good to put on the table 2 fairy good spam detectors, but...
- Extremelly small sample (not puting the mailing lists on the whitelist
could give a better hint on false positives, even with such a small
sample). The accuracy results could be very different in the long run.
- AFAIK spamassassin includes bayesian filtering by now, and anyway can
use bogofilter. Filtering by keywords or searching for duplicate messages
a la razor is mostly useless by now for most spam, as they include random
text, intercalate random html comments inside keywords or even changes
letters with symbols (i.e. w0rd instead of word).
- Put a reference to POPFile (a perl pop3 proxy with bayesian
clasification) but not to bogofilter, that work in the same way than
spamassassin and spastic. I'm not saying that popfile is bad, in fact, is
THE way I'm using right now to filter spam, but I don't think that it
should be used at server level like the other two.
[reply]
[top]
[»]
Moderate View...
by antrik - May 24th 2003 09:25:43
Seeing all this flaming here, I think the author deserves to hear my
somewhat
more positive opinion.
To someone like me, who has heard about SpamAssassin (actually, even read
some
article -- which wasn't terribly useful...), but knows hardly anything
about
it's *practical* application, this article actually *is* very informative.
Only
the title is somewhat misleading in this reagard...
On the other hand, I agree that the "comparision test" is silly.
Ever heard of
a thing called "statistical relevance"?...
To the other flamers: No, it's *not* necessary to use an identical test
set. If
the test sets are large enough to give meaningful results, it's
statistically
completely irrelevant whether they are the same. On the other hand, if the
test
sets are too small (as they definitely are here), it doesn't help to use
the
same test set -- the probability of one of the programs being handicapped
by an
accumulation of messages it doesn't like is just the same.
[reply]
[top]
[»]
Geez...
by Chris Carlin - May 24th 2003 03:18:38
I mean seriously, no offense but this article completely failed to live up
to its potential. I mean, objective (if not significant) comparison of
server side spam filtering implementations? Great!
But not only did this one not cover many systems, the ones it did cover
weren't explored in a meaningful way even in terms of this guy's specific
case.
Next time use procmail (or whatever) to feed the same messages to each of
the filtering systems and let the whole thing run for two weeks. THEN
there will be something approaching worthy of the tshirt.
[reply]
[top]
[»]
More SpamAssassin features
by Bastian Kleineidam - May 24th 2003 03:13:27
SpamAssassin also supports pyzor (a free razor
clone). And with the spamd/spamc feature and the ifspamh script I can
use it with qmail which does not use procmail filtering but its own .qmail
configuration.
[reply]
[top]
[»]
worthless?
by tooar - May 24th 2003 01:15:32
oh boy, fm should really care more about their articles. how intelligent
does one have to be to know that a spam filter test with different emails
is completely use- and worthless?
as it seems, in the end, the author realized about half of the truth:
"Unfortunately, I realized too late that I should have saved the
messages with which each program made an error and cross-tested them
against the other one to see if it would have done better. I made a note
of it for the next time I run a comparison test."
hey, not only the error messages, ALL messages.
[reply]
[top]
[»]
Re: worthless?
by Grant K Rauscher - May 24th 2003 02:23:06
Active Spam Killer is nice -
you manage your queue by e-mail... an HTML interface on the web for
processing your queue would be easy, so it could integrate with webmail
well. have meant to try SpamAssassin, though.
ASK fm project page
[reply]
[top]
[»]
Re: worthless?
by David Necas (Yeti) - May 24th 2003 04:53:59
I agree the article is bogus.
To author:
First, It's not so hard to set up procmail so that both filters
can be tested simultaneously (by duplicating the queue) and thus in fair
conditions. Anyone taking the testing seriously would do it.
Second, how accurate results you expect from 16 spams (leaving out they
are different)? I get about 20 spams a day -- and I've never seen
any false positive from SpamAssassin, while I see approximately one false
negative per week. While I didn't perform any exact measurement, my claims
are based on experiences with a sample of 10k+ e-mails.
Then, spamassasin can remove its markup from the messages, just run it
on the message again, with -d. If the markup really annoys you, it's not
hard to automate this action.
Then, you can always install perl to home. Well, not always only if you
have reasonable disk quota. But who wants an account on a machine without
perl and with low disk quota? ;-)
Then, ... OK, I read ``don't flame and insult others'' above, so I stop
here ;-)
[reply]
[top]
[»]
Re: worthless?
by slippery - May 24th 2003 11:36:07
> First, It's not so hard to set up
procmail so that both filters can be
tested simultaneously (by duplicating
the queue) and thus in fair conditions.
Anyone taking the testing seriously
would do it.
It was not meant to be an exhaustive test, but more anecdotal. I mention
several times not to draw any firm conclusions from the test results.
However, if I run any future comparisons, I will be much more careful and
thorough.
Spam is more topical today than two years ago since the problem has become
so much worse recently. What I hoped to do was share my experience, and
pass on a few things I learned about spamassassin and the issues
surrounding spam in general.
> Then, ... OK, I read ``don't flame and insult others'' above, so I stop
here
;-)
Flame on! The negative comments mean I failed to communicate my goal, my
message, or both. I will learn from the all the feedback, positive or
otherwise.
Best Regards,
Keith
[reply]
[top]
[»]
Re: worthless?
by David Necas (Yeti) - May 24th 2003 14:32:26
% It was not meant to be an exhaustive
> test, but more anecdotal.
If it was anecdotical, you probably shouldn't list the efficiencies
with four significant digits ;-) (have you any idea how many e-mails you
have to test to achieve this precession?) OK, normal people don't care...
they also don't mind deducing conclusions from graphs w/o units and w/o a
zero axis on TV... :-)
An important thing is being up to date (you don't even mention the
versions!). Spammers adapt and old spam filter versions give considerably
worse results than recent ones (I have experiences mainly with
SpamAssassin, but unless the filter is quite stupid and inefficient, this
rule should be quite general). I would even suggest upgrading the filters
during the test if/when a new version is released -- reality doesn't
wait.
% Flame on!
So, at least one more SpamAssassin note: A weakness that definitely
worths mentioning is its speed -- or better slowness. As someone
pertinently commented it: I don't want to compute the Universe, I just
want to check for spam. Spamd slows down a SMTP server a lot, not speaking
about spamassassin run by individual users via procmail, which is even
worse.
[reply]
[top]
[»]
Re: worthless?
by slippery - May 24th 2003 15:43:49
>If it was anecdotical, you probably
shouldn't list the efficiencies with
four significant digits ;-)
Remember, I stated the results could not be generalized. What I reported
was what actually happened to 4 significant digits ;)
> An important thing is being up to date
(you don't even mention the versions!).
The versions were in the original text (SpamAssassin 2.44 and Spastic
3.0), but removed by the editor.
>So, at least one more SpamAssassin note:
A weakness that definitely worths
mentioning is its speed -- or better
slowness.
You have hit on what I think is one of the real harms of spam. It wastes
resources, both computing and human. The more spam there is, the more
resources are wasted dealing with it.
Although there are some U.S. state laws prohibiting spam, legislation
can't be effective unless it can be enforced globally. A redesign of SMTP
that ensures e-mail headers can't be forged would be very hard or
impossible to implement and would take years or decades to roll out. This
is why I believe spam will be with us for a long time.
The only way I can see to deal with it is to make it cost the spammer
money. If it costs something to send a million spams, the spammers will
be much more selective and targeted. It would not eliminate spam, but it
would make it more like the junk mail you get in snail mail. The levels
would drop to something more sane. How do you meter e-mail? I have no
idea.
[reply]
[top]
[»]
Re: worthless?
by Sam - May 29th 2003 20:45:05
> >If it was anecdotical, you probably
> shouldn't list the efficiencies with
> four significant digits ;-)
>
> Remember, I stated the results could not
> be generalized. What I reported was
> what actually happened to 4 significant
> digits ;)
You said it couldn't be generalised to other users, but that is a whole
different thing to not being generalised to a higher volume of your
email.
It shouldn't be to hard to run each on the same corpus of at least a few
hundred emails if not a few thousand.
If you do so in the future, then I would also suggest measuring recall and
precision seperately and not just reporting precent "correct" (and not
reporting more sig figs than you have). Information retrieval is a
reasonably old field with lots of previous work, recall and precision has
served the field well and provides useful metrics.
You may know all of this, but a ranting I shall go...
Recall is the spam found divided by the the spam that was present, 16/17
and 10/12 in your sample. Precision is the number of spams found divided
by the items found, 16/17 and 10/10 in your sample.
Measuring accuracy (which you did) and error (1-accuracy) is useful for
doing a comparison, however there needs to be some weighting of false
positives to fals negatives. A weighting of 1 is simply not useful for
real world email filtering. After all I suspect everyone would gladly
trade a recall:100%, precision:95% for recall:95%, precision:100% without
any hesitation.
Searching for spam filtering on CiteSeer (http://citeseer.nj.nec.com/cs)
will provide a few interesting papers which show some good ways of doing
comparisons. Obviously a freshmeat article doesn't need to be anywhere
near as scientifically written :)
Since your the project leader of Spastic I'll mention that it would be
nice if spam filter projects provided a simple testing interface. A script
(or C code if that's what they like) that takes a mail spool file as input
and outputs two mail spools - spam and non-spam, would be useful. Though
just making a mode where the program reads an email from stdin and
produces appropriate exit codes for spam and non-spam.
Of course Spastic might already do that, I haven't checked, I'm just
raving :)
[reply]
[top]
[»]
Re: worthless?
by jeff covey - May 24th 2003 10:37:21
I do realize this is short on scientific method; Keith says as much.
I didn't see good statistical analysis as its purpose; it just
introduces some spam fighting tools and methods to people who may be
looking for them, and I hoped others would chime in with information
about how they deal with spam and the problems and solutions they've
had to evolve to stay ahead of the ever-changing spam wave. I didn't
see the article as offering many conclusions itself, but as a
springboard to discussion.
I'd be interested in a more extensive overview of antispam tools in a
category review of Topic ::
Communications :: Email :: Filters, if anyone feels up to it.
-- vs lbh pna ernq guvf, lbh'er n trrx.
[reply]
[top]
[»]
Re: worthless?
by Eric Kilfoil - Jun 1st 2003 19:29:02
> I'd be interested in a more extensive
> overview of antispam tools in a
> category review of Topic ::
> Communications :: Email :: Filters, if
> anyone feels up to it.
I would like to see this as well. Most people are concerned about things
such as false positives and accuracy. Personally, my biggest concern is
per-user configurability. I think that these tools can provide me with
the level of flexibility that I need to help users fight spam. My problem
then becomes scalability. In a large production environment, say upwards
of 50k users, i can't afford for a spam filter solution to drop my CPU
resources to zero. I can't (literally) afford to add 10 more MX servers
because my spam solution hogs all of the resources.
PERL is nice. It has great flexibility and amazing text processing power.
Unfortunately, it is slow. I would really love to see an open source spam
fighting solution written in a compiled language to help improve
scalability. Perhaps spastic can provide that to me.
The nicest feature that I can see about SpamAssassin is that I can provide
a web interface to my users to let them choose how aggresive they want
their spam to be filtered. Then if they complain about a false positive,
i'll can just tell them to decrease the aggressiveness of the filter.
I liked the article quite a bit. I would really have liked to see
information about MANY spam solutions rather than just these two.
Brightmail is a decent commercial offering. Fortinet makes anti-spam
hardware based firewalls, and there are tons of others.
[reply]
[top]
[»]
Re: worthless?
by slippery - Jun 4th 2003 04:54:09
% Personally, my biggest concern is
> per-user configurability. I think that
> these tools can provide me with the
> level of flexibility that I need to help
> users fight spam. My problem then
> becomes scalability. In a large
> production environment, say upwards of
> 50k users, i can't afford for a spam
> filter solution to drop my CPU resources
> to zero. I can't (literally) afford to
> add 10 more MX servers because my spam
> solution hogs all of the resources.
Wow, 50,000 users puts you in a category far above most environments. Any
global solution would have be very fast and likely span many incoming mail
servers.
> PERL is nice. It has great flexibility
> and amazing text processing power.
> Unfortunately, it is slow. I would
> really love to see an open source spam
> fighting solution written in a compiled
> language to help improve scalability.
> Perhaps spastic can provide that to me.
Spastic is not compiled, per se, it uses native procmail commands and
shells out to grep for regexps so I don't think it would scale to the
level you need.
> I liked the article quite a bit. I
> would really have liked to see
> information about MANY spam solutions
> rather than just these two. Brightmail
> is a decent commercial offering.
> Fortinet makes anti-spam hardware based
> firewalls, and there are tons of others.
A complete examination of ALL the spam programs, commerical, open source,
and hardware solutions would be daunting. There are probably 50-100 open
source solutions alone. The most recent version of Imail Server 8.0 from
Ipswitch (which is used at one of my clients with 500 users) includes a
decent anti-spam filter. Even dedicated testing labs like at ZDnet/Cnet
usually limit their testing to 8-10 products at a time.
I still think that current anti-spam solutions are more bandaids than
cures. Until e-mail is metered like snail mail, the economics of spam
will keep spammers in business. And I'm not sure I want e-mail metered.
Best Regards,
Keith
[reply]
[top]
[»]
Re: worthless?
by A'rpi/ESP-team - Jun 7th 2003 05:44:13
>
> PERL is nice. It has great flexibility
> and amazing text processing power.
> Unfortunately, it is slow. I would
> really love to see an open source spam
> fighting solution written in a compiled
> language to help improve scalability.
I've started a project spamassassin-c, a rewrite of the spamassassin
engine in plain hand-optimized C/asm code. Using libpcre for regexp
matching, with precompiled regexp ruleset compiled into the binary.
It was around 20 times faster than SA, with a bit limited featuers (i had
to left some complicated regexp out, as libpcre couldn't handle it, and SA
also have some rules implemented as perl code). Finally it turned out, that
perl SA is slow because it had to re-compile regexps and do whoel perl
startup at every mail. They also noticed this, and created client-server
approach, ie spamd+spamc. So the rule matching code is running in spamd,
with precompiled regexps, hashed searches initilaized at startup,
resulting in a 10 times faster performance. So my stripped down C version
was only 2 times faster than spamd. I guess if i implement all the SA
features, it wouldn't be faster more than 20-30%, so it simply doesn't
worth it. I've stopped my project.
A'rpi
[reply]
[top]
[»]
Re: worthless?
by rmemmons - Aug 10th 2003 10:12:59
> My problem then
> becomes scalability. In a large
> production environment, say upwards of
> 50k users, i can't afford for a spam
> filter solution to drop my CPU resources
> to zero. I can't (literally) afford to
> add 10 more MX servers because my spam
> solution hogs all of the resources.
I understand the need for scalability and faster code would be great. I
would however challange the notition that you can't afford 10 more MX
servers. I think that it's more like "management does not want to pay for
10 more MX servers."
I don't know your costs, but if you just take some simple numbers
regarding the true man-hour costs of spam on your end users you'll get
into the millions, or 10's of millions of dollars a year in lost man
hours. For example 10 emails a day and 10 seconds an email and 50000
users 10.1 hours a year per employeed which translated to 75 million.
This is just an example--and it is huge.
If this is true, you have the money, your organization just lacks the
will.
I bring this up because I work for a company of similar size, and I
constantly see IT saying "too expensive"... but at the same time being
happy to push much larger costs on the buget of others due to inaction or
stupid policies. I don't know if your org is like that, but mine
certainly is.
Rob
[reply]
[top]
[»]
Re: worthless?
by Eric Kilfoil - Aug 28th 2003 16:58:04
> I understand the need for scalability
> and faster code would be great. I would
> however challange the notition that you
> can't afford 10 more MX servers. I
> think that it's more like "management
> does not want to pay for 10 more MX
> servers."
I suppose we're getting off topic, but here goes anyway.
Yes it's management, but even I agree with them (for a change). We're
talking about serving 50k customers, not 50k employees. The cost of
providing the service outweighs the benefits gained from the cost. As
we've all seen in the telecom bust, providing services at a lower cost
than what you pay is a bad idea :). funny that the engineers realized it
and marketing didn't... oh well. Anyway, it's basically an ROI
decision.
[reply]
[top]
|
 |