Articles : Filtering spam by black words in message subjects

When messages are filtered on the server by their headers, most spam is deleted with the help of DNSBL, when the plug-in searches online black lists for the sender's IP address. This method works quite well, but quite often obvious spam comes from senders whose IP addresses are not listed in any black lists.
There is an additional method for such cases - detecting obvious spam by keywords in message bodies. The plug-in has a database of stop words from message subjects. Each word has its spam coefficient that is defined by the number of messages where this word occurs. The spam coefficient of words that occurred at least once in a normal message equals 0 and such words are automatically excluded from the list.
If filtering by stop words is enabled, the keyword database is updated automatically when you train the plug-in. It works like this:
• Words from the subjects of messages marked as spam that are not in the database are added to it.
• If some word from a spam message is already in the database, its spam coefficient increases.
• If the plug-in comes across a word in a normal message and this word already has some spam coefficient, it is removed from the database.

By default, filtering by stop words is disabled and the database is empty. You can enable it in the dialog box with the properties of filtering by stop words that you can open by clicking the "Black words..." button on the "Filtering" tab of the plug-in configuration window.



The minimum spam coefficient for stop words defines the minimum value of this coefficient for the word to be used in filtering out messages as spam. Words whose spam coefficient is less than the specified one are ignored during filtering.

The minimum number of stop words for blocking a message defines how many stop words there should be in the body of a message for it to be filtered out as spam.

To enable filtering by stop words, you should fill the database of stop words. There are several ways to do it:
• Fill the database manually in the corresponding window.
• Import stop words from a file.
• Train the plug-in using your mail and allow it to fill the list automatically.



You can download a list of stop words for importing from here:
http://antispamsniper.com/misc/black_words.txt

Note that stop words are used to delete messages on the server! You should be really careful creating the list and avoid adding words that may occur in normal mail at least once. After you import the abovementioned list, it is recommended to train the plug-in using your normal mail in order to exclude words occurring in the subjects of normal messages from the list.