Articles in this section

How does SpamAssassin training work in Plesk?

kb: how-to Plesk for Linux ABT: Group B

Applicable to:

  • Plesk Obsidian for Linux
  • Plesk Onyx for Linux
  • Plesk for Linux

Question

How does SpamAssassin training work in Plesk?

Answer

SpamAssassin training procedure works using Bayes system, which has its own internal database. The database is updated each time a message is marked as spam or moved into spam folder or back to Inbox folder.

Bayes database contains words and patterns from processed messages known as tokens, which Bayes uses to identify message or spam or non-spam (ham). For example, if you mark a message with content or mail subject like: "Buy! This is magical offer" as spam, the sequence of the words from that message will be included into Bayes database. However, it does not mean that each message with words "buy" or "offer" will be marked as spam subsequently.

Standalone SpamAssassin and SpamAssassin inside Plesk Email Security extension work slightly different:

Standalone SpamAssassin

Bayes system is enabled only after it has a particular number of spam and non-spam (ham) messages in it.

This default value is 200 for both spam and non-spam messages. This value can be changed:

  1. Connect to the server via SSH

  2. Add new values into /etc/mail/spamassassin/local.cf by using the following SpamAssassin options (the lowest possible is 10):

    CONFIG_TEXT: bayes_min_ham_num 100
    bayes_min_spam_num 100

  3. Restart SpamAssassin daemon:

    # systemctl restart spamassassin.service

Note that in standalone SpamAssassin both spam and non-spam (ham) messages are trained during Daily Task execution, so there is no need to train it for non-spam messages.

Progress of training can be checked by the following command:

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 10 0 non-token data: nspam
0.000 0 10 0 non-token data: nham
0.000 0 59312 0 non-token data: ntokens
0.000 0 1574146076 0 non-token data: oldest atime
0.000 0 1596425418 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

More values in nspam and nham lines grow more tokens are in Bayes filter database.

SpamAssassin as part of Plesk Email Security

Training available only in paid version of extension.

Bayes system is enabled only after it has a particular number of spam and non-spam (ham) messages present in Bayes database. The default value is 10 for both spam and non-spam messages. To increase the number of non-spam (ham) it is required to move message to spam folder or mark message as spam and then move it back to Inbox folder or mark message as not spam.

Progress of training can be checked by the following command:

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 10 0 non-token data: nspam
0.000 0 10 0 non-token data: nham
0.000 0 59312 0 non-token data: ntokens
0.000 0 1574146076 0 non-token data: oldest atime
0.000 0 1596425418 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

More values in nspam and nham lines grow more tokens are in Bayes filter database.

Was this article helpful?

Comments

1 comment
Date Votes
  • Documentatiot: Note that in standalone SpamAssassin both spam and non-spam (ham) messages are trained during Daily Task execution, so there is no need to train it for non-spam messages.

    I have a standard Plesk server on Linux (updated to the latest version). But I see no progress when after checking: sa-learn --dump magic for several days.
    When I manually run sa-learn --spam on serveral .Spam-folders, the nspam-value grows.

    Something wrong or I have to add to  /etc/cron.daily/50plesk-daily ?

    0

Please sign in to leave a comment.