The first step towards eliminating spam in your organization is to choose an anti-spam filtering package. With hundreds of anti-spam solutions available, it can be a challenging task to find the one that’s right for your site. Since your decision will directly affect the email of every user in your organization, it’s worthwhile to carefully evaluate possible solutions. This whitepaper will provide you with what you need to determine the best solution for stopping spam at your site: several sets of criteria to look for in an anti-spam filter, objective testing procedures, and even a sample user feedback form.
Shopping for an anti-spam filter is like shopping for a new car - you want to have a basic idea of what you’re looking for and what your needs are. If a particular anti-spam filter doesn’t have the features you need, it’s not worth taking the time to evaluate further. Some criteria you might want to look for up front are:
Criteria | Explanation |
---|---|
Supported platforms | An anti-spam filter needs to support your site’s operating system and email server combination. Many vendors provide email proxy products that can be used with even the most esoteric messaging architectures. If your domain is served by multiple email servers, the filter should support synchronizing data between them. |
User authentication | User interfaces that allow users to control filtering settings for their accounts is a must-have feature for almost every site. Most of these user interfaces require that detailed user information, especially passwords, be contained in an LDAP directory. If not all of your site’s user information is contained in an LDAP directory, you need to choose a solution that supports multiple authentication methods. |
Site-specific needs | If you have special site-specific needs, make sure any anti-spam filter you are considering will support those needs. For example, you may need a solution that allows you to completely customize the user interface so it is consistent with an existing email portal. |
Cost | If a product costs more than you have available in your budget, then it’s not worth further consideration. Note that some vendors are willing to negotiate price in return for other considerations, in addition to offering large discounts for certain customers such as educational institutions. |
Technical Support | Make sure that the vendors of anti-spam filtering packages you are considering provide around-the-clock telephone support. Support by email is convenient, but by itself it’s insufficient if your messaging system is inoperative. |
Geographical Location | Computer software is a global market, so it’s not unusual to purchase software from a vendor located thousands of miles away. If your site is located several timezones away from the vendor or you speak a different native language, it can cause serious communications problems. Make sure the vendor maintains offices or has a distributor in your region of the world. |
Corporate stability | The demand for anti-spam filters has been increasing dramatically for the last several years. This makes it an attractive market for startup companies and individuals, who may not necessarily have the financial resources or experience to effectively support their products for the long-term. To play it safe, you should choose an anti-spam filter provided by a company with at least several years of experience with the email security market. |
Once you have determined which anti-spam filtering solutions fit your basic criteria, it’s time to install them and see how they perform for your site. You should approach this the same way you would approach test driving a car you’re interested in buying: you want to put the product through its paces and see how well it fits your particular needs.
The table below contains the major criteria you should use to evaluate each anti-spam filter. Each criterion is accompanied by detailed points to consider while evaluating the filters.
Category | Points to consider |
---|---|
Accuracy |
|
Configurability |
|
Information |
|
Methodology |
|
Performance |
|
Security |
|
Time Cost |
|
End users are constantly becoming more proficient in computer skills. As their skills increase so does their desire and ability to have more control over their personal data, including email. At the same time, rapid improvement in technology usability has created greater expectations of all user interfaces ranging from automobile dashboards and personal music players to enterprise software.
The user interface provided by spam filters should be completely customizable to fit any site’s needs. It should also do well when judged by the below criteria.
Criteria | Description |
---|---|
Simple and Natural Dialog | The instructions and labels that appear in the interface should be written in a conversational tone. Usage of jargon or acronyms that are unfamiliar to end-users should be avoided if possible. |
Natural Language Support | If the end-users of an interface speak a different language than the interface uses for instructions and labels, the interface will be virtually useless. Since most user interfaces use abbreviated language in labels, even a foreign speaker with basic fluency in the interface’s language may have issues. While it is unrealistic to expect a user interface to support every conceivable language out of the box, it should provide the ability for the system administrator or a translator to rewrite all instructions and labels in the users’ native language. |
Minimize User Memory Load | End-users should have to remember little (if any) information specific to a given interface between usage sessions. The interface should be clear and intuitive, with easily obtainable help on each part of its functionality. |
Consistency | A user interface should have a consistent appearance and layout with an intuitive navigation system. Changes in appearance from one area to the next can disorient and confuse users. A consistent layout also reduces the time required by new users to become comfortable with the interface. |
Feedback | An interface should provide clear feedback about actions it is taking on the user’s behalf. For example, if a quarantined message is released the interface should inform the user that it has been released. Simply returning the user to their home page without an informational status update can leave them in doubt as to whether the requested actions were performed or not. |
Clearly Marked Exits | The user should be able to exit the interface (i.e. logout) from any place it makes sense to do so. The exit should be conveniently placed so the user can quickly logout. The user should also be able to conveniently return to their main page from any part of the interface. |
Good Error Messages | When an error occurs, the interface should display an informative error message. First tier help desk staff should be able to quickly determine if a serious error has occurred based on the text of the error message. If the error message requires action from the system administrator, that action should be obvious from the text of the error. |
Help and Documentation | All help and documentation required by the user interface should be self-contained. It’s unreasonable to expect end-users to refer to a separate manual while inside the interface. The help should clearly describe the functionality of each feature in the interface to keep the system administrator from being bombarded with user questions. |
Before turning an anti-spam filter loose on your end user’s mail, you might want to perform one or more of these non-production evaluation procedures. They’re categorized as non-production since they don’t affect end users’ email in any way. (In fact, end users shouldn’t even notice that you’re running them.) Because they don’t impact your site’s actual mail, it’s possible to experiment with lots of different configuration options to determine what would work best for your site.
Procedure | Description |
---|---|
Corpus Testing | Corpus testing is usually the first method used by sites that wish to compare the accuracy of various anti-spam filters. Large collections of messages, each of which is called a corpus, are sent to a test system running the filter. Counts are kept of false positives and false negatives. (See Sources of Spam Messages and Sources of Non-Spam Messages in this whitepaper for information about where to obtain messages for a corpus.)
A typical corpus testing setup requires two systems, one of which is running the anti-spam filter. The other system is used to send the messages to the filter via SMTP (a simple Perl script works well for this). At the end of each test run, the anti-spam filter should be able to generate a statistical report that will show the number of messages identified as spam. |
Forking User Mail | This testing method sends a copy of all messages sent to certain users to a non-production system running the anti-spam filter. This allows you to see how a filtering product performs on actual mail sent to your site, without affecting mail delivery to your user base in any way.
The first step in performing this test is to select a group of volunteer users whose mail will be “forked” to the non-production system. These users shouldn’t receive any mail that the IT staff isn’t allowed to read (for example, the human resources director probably isn’t a good test user). On the non-production system, install the anti-spam filter and set up an account for each test user. On your production MTA, create an alias for each of the test users that sends one copy of any incoming message to their “real” account, and one copy to the corresponding account on the test system. For example, if PMDF is your production MTA you would create an entry in the aliases file for each test user that looks something like:
john.doe: jdoe@example.com, jdoe@test.example.com IT staff can log into the various user accounts on the test system to determine how the filtering solution will perform for actual users at your site. |
Production evaluation procedures demonstrate how a product deals with your site’s mailstream on your production email servers. Once initial testing has been conducted, these procedures can be used to determine the user population’s reaction to the product.
Some basic guidelines you should consider following regardless of which production procedure you choose to implement are:
Procedure | Description |
---|---|
Quarantining | This testing method takes full advantage of the PreciseMail Anti-Spam Gateway quarantine functionality. Spam messages are placed in a quarantine area as they are received instead of being delivered to the user. Users may access their quarantined messages at any time through the web interface. Quarantine notification messages may be sent to users at administrator-determined or user-determined intervals. |
Header Insertion | Special headers containing information about which rules a message triggered and whether or not it is considered spam by the filtering solution are inserted into each scanned message in this testing method. Ideally, one header line should be inserted for each data point the anti-spam filter used to make its decision, as well as a summary line giving the message’s overall score. Either the end user who received the message or the system administrator can look at any message to determine why it was classified as spam or non-spam.
Users who are not part of the test group will not notice anything different about their mail messages. Test users can set up rules inside their mail client to filter spam messages into a special folder based on the presence of certain headers inserted by the anti-spam filter. |
Subject Tagging | This testing method places a short text string in the subject line of messages identified as spam. This allows every user to see which messages the filtering solution considers spam, without any messages being quarantined or discarded. In addition, users can set up rules inside their mail clients to filter messages with this string in the subject to a spam folder. |
Log Monitoring | Every incoming mail message is scanned, but no alterations are made to messages in this testing method. The system administrator can monitor the anti-spam filter’s log file and statistical reports to gauge the effectiveness of the filter. Users cannot see which messages would have been filtered as spam. |
One of the most popular methods for testing an anti-spam filter’s accuracy is to send it a large number of spam messages and count how many get through. There are several sources of spam messages that are suitable for testing, but by far the best place to start is your own user base. Encouraging your users to save spam messages they receive to a public IMAP folder will quickly give you a corpus of spam that represents the sort of messages your users want to be filtered. (If IMAP isn’t available at your site, you can have users forward their spam messages as attachments to a special account.)
If you wish to test against a broader variety of spam than is received by your user base, there are several online spam repositories. By far the largest and most popular is http://www.spamarchive.org, which averages 5,000 new spam messages a day. The spam messages are organized into compressed archives, and are freely downloadable.
Important: Spam messages from SpamArchive.org and from other online corpora usually have incomplete header information. Make sure you carefully check the messages before using them for testing, especially the Date:, To:, and From: headers. Many spam messages have incomplete or invalid headers, and anti-spam filters take that into account when deciding if a message is spam. If any of these headers are missing or invalid, simply replace them with a valid header of the same type.
Just like spam messages, the best source of non-spam messages to use for testing is your user base. Unfortunately, while most users are very happy to hand over a copy of every spam message they receive, they’re much less likely to make even a small subset of their non-spam messages available for testing. Unlike spam messages, there are no large online repositories of non-spam messages.
One possible source of non-spam messages is NNTP newsgroups. The newsgroups provide a very large number of non-spam messages on thousands of topics. Because of the wide variety of topics, messages from newsgroups can be used to closely approximate the wide variety of email that your users receive.
Many user agents (especially Pine) are capable of pulling every message from an NNTP newsgroup and placing it in a BSD-formatted mailbox file. A simple Perl script can be used to send the contents of the BSD-formatted mailbox to a system running the anti-spam filtering solution Just like spam messages that are made publicly available for testing, you should carefully check the basic headers of messages obtained from newsgroups and replace them as needed.
Note: Spam sometimes appears in NNTP newsgroups, so an IT staff member should check the messages obtained from them and remove any spam.
The following activities will produce inaccurate results while evaluating most anti-spam filtering solutions: