Apache SpamAssassin

Apache SpamAssassin
	Apache Software Foundation
Initial release	April 20, 2001; 22 years ago
Stable release	4.0.1 / 29 March 2024; 14 days ago
Apache License 2.0
Website	spamassassin.apache.org

Apache SpamAssassin is a

Apache Foundation

since 2004.

The program can be integrated with the

mail programs

. Apache SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.

History

Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to SourceForge on April 20, 2001.^[3]

In Summer 2004 the project became an

Apache Software Foundation project and later officially renamed to Apache SpamAssassin.^[4]

The SpamAssassin 3.4.2 release in September 2019 was the first in over three years, but the developers say that "The project has picked up a new set of developers and is moving forward again.".[5]

In December 2019, version 3.4.3 of SpamAssassin was released.

In April, 2021, version 3.4.6 of SpamAssassin was released. It was announced that development of version 4.0.0 would become project's focus.^[6]

Methods of usage

Apache SpamAssassin is a

daemon

(spamd). The client/server or embedded mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.

Typically either variant of the application is set up in a generic

pipe

all incoming mail through Apache SpamAssassin with an adjustment to a user's procmailrc file.

Operation

Apache SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. Most rules are based on regular expressions that are matched against the body or header fields of the message, but Apache SpamAssassin also employs a number of other spam-fighting techniques. The rules are called "tests" in the SpamAssassin documentation.

Each test has a score value that will be assigned to a message if it matches the test's criteria. The scores can be positive or negative, with positive values indicating "spam" and negative "ham" (non-spam messages). A message is matched against all tests and Apache SpamAssassin combines the results into a global score which is assigned to the message. The higher the score, the higher the probability that the message is spam.

Apache SpamAssassin has an internal (configurable) score threshold to classify a message as spam. Usually a message will only be considered as spam if it matches multiple criteria; matching just a single test will not usually be enough to reach the threshold.

If Apache SpamAssassin considers a message to be spam, it can be further rewritten. In the default configuration, the content of the mail is appended as a MIME attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the tests passed and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.

Apache SpamAssassin allows for a per-user configuration of its behavior, even if installed as system-wide service; the configuration can be read from a file or a database. In their configuration users can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and Apache SpamAssassin then assigns a higher score to all mails that appear to be written in another language.

Apache SpamAssassin is based on heuristics (pattern recognition), and such software exhibits false positives and false negatives.

Network-based filtering methods

Apache SpamAssassin also supports:

DNS-based whitelists
Fuzzy-checksum-based spam detection filters such as the Distributed Checksum Clearinghouse, Vipul's Razor Archived 28 March 2013 at the Wayback Machine and the Cloudmark Authority plugins (commercial)
proof-of-work
Sender Policy Framework and DomainKeys Identified Mail
URI blacklists such as SURBL or URIBL
which track spam websites

More methods can be added reasonably easily by writing a Perl plug-in for Apache SpamAssassin.

Bayesian filtering

Apache SpamAssassin reinforces its rules through

Bayesian filtering

where a user or administrator "feeds" examples of good (ham) and bad (spam) into the filter in order to learn the difference between the two. For this purpose, Apache SpamAssassin provides the command-line tool sa-learn, which can be instructed to learn a single mail or an entire mailbox as either ham or spam.

Typically, the user will move unrecognized spam to a separate folder, and then run sa-learn on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, sa-learn can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will help score future e-mails based on this learning to improve the accuracy.

Licensing

Apache SpamAssassin is

open source software, licensed under the Apache License 2.0. Versions prior to 3.0 are dual-licensed under the Artistic License and the GNU General Public License

.

sa-compile

sa-compile is a utility distributed with Apache SpamAssassin that compiles a SpamAssassin ruleset into a deterministic finite automaton that allows Apache SpamAssassin to use processor power more efficiently.

Testing Apache SpamAssassin

Apache SpamAssassin is designed to trigger on the GTUBE, a 68-byte string similar to the antivirus EICAR test file. If this string is inserted in an RFC 5322 formatted message and passed through the Apache SpamAssassin engine, Apache SpamAssassin will trigger with a weight of 1000.

Notes

^ "Project Management Committee". The Apache Software Foundation. 2022. Retrieved 23 August 2023.
^ Sidney Markowitz (29 March 2024). "[ANNOUNCE] Apache SpamAssassin 4.0.1 available". Retrieved 30 March 2024.
^ "SpamAssassin Prehistory". Apache Foundation. Retrieved 19 December 2018.
^ "SpamAssassin Project Incubation Status". Apache Foundation. Retrieved 19 December 2018.
^ "SpamAssassin is back". LWN.net. Retrieved 19 December 2018.
^ "SpamAssassin: News and Announcements". spamassassin.apache.org. Retrieved 12 April 2021.

References

McDonald, Alistair (27 September 2004). SpamAssassin: A Practical Guide to Integration and Configuration (1st ed.).
ISBN 978-1-904811-12-1
.

Schwartz, Alan (July 2004). SpamAssassin (1st ed.).
ISBN 978-0-596-00707-2
.

External links

Official website
Apache SpamAssassin Wiki
Apache SpamAssassin Rule Updates Wiki Automatically updating Apache SpamAssassin
KAM.cf KAM Ruleset for Apache SpamAssassin

[1] "Project Management Committee". The Apache Software Foundation. 2022. Retrieved 23 August 2023.

[wikidata-998897ecb2c836bc5983c8a2b6d5adcea7eafadb-v11-2] Sidney Markowitz (29 March 2024). "[ANNOUNCE] Apache SpamAssassin 4.0.1 available". Retrieved 30 March 2024.

[3] "SpamAssassin Prehistory". Apache Foundation. Retrieved 19 December 2018.

[4] "SpamAssassin Project Incubation Status". Apache Foundation. Retrieved 19 December 2018.

[5] "SpamAssassin is back". LWN.net. Retrieved 19 December 2018.

[newspage-6] "SpamAssassin: News and Announcements". spamassassin.apache.org. Retrieved 12 April 2021.

[1]

[2]

[3]

[4]

[6]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive Bluesky iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

v t e Perl
People	Larry Wall Randal L. Schwartz Damian Conway Allison Randal Audrey Tang Sean M. Burke chromatic brian d foy Jesse Vincent
Things	CPAN Perl Foundation Perl Mongers PerlMonks archives module Perl VM YAPC
Frameworks	Bioperl Catalyst Dancer DBI DBIx::Class LWP Mojolicious Moose Plack PSGI Template Toolkit
Books	Programming Perl Learning Perl Intermediate Perl Perl Best Practices
Software	Amavis Argus @SSP AWStats BackupPC Bricolage Bugzilla Dada Mail ikiwiki Makepp Movable Type Munin OTRS Request Tracker SpamAssassin TWiki/Foswiki W3Perl Webmin
Related	Parrot Raku Rakudo rules MoarVM
Outline Category

History

Methods of usage

Operation

Network-based filtering methods

Bayesian filtering

Licensing

sa-compile

Testing Apache SpamAssassin

See also

Notes

References

External links