High CPU usage for "sa-learn" and "spamtrain" processes in a server with Plesk




  • Avatar
    Alexander Bien

    Are you sure that we need to modify 60sa-update in order to change high cpu from sa-learn?

    This pstreelog output (C7, P17.8.x) would suggest otherwise:


    I couldnt find any mentioning of sa-learn in sa-update either. Please advise.

    Comment actions Permalink
  • Avatar
    Robert Asilbekov

    @Alexander Bien

    Thank you for the feedback. The article has been updated.

    Comment actions Permalink
  • Avatar
    Peter Debik

    On CentOS

    echo "/bin/nice -19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig $@" > /usr/bin/sa-learn

    must be executed as

    echo "/bin/nice -19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig \$@" > /usr/bin/sa-learn

    because else, the "$@" will not be exported into /usr/bin/sa-learn.

    Comment actions Permalink
  • Avatar
    Anton Maslov

    Peter, article updated! Thank you for report!

    Comment actions Permalink
  • Avatar

    Trying to disable and get the error: 

    mv /etc/cron.daily/60sa-update /root
    mv: cannot stat ‘/etc/cron.daily/60sa-update’: No such file or directory

    Comment actions Permalink
  • Avatar
    Ivan Postnikov

    Hello Bbennett

    Did you execute this command as a root user?

    The 60sa-update exists on my test server and the issue wasn't reproduced:

    # stat /etc/cron.daily/60sa-update
    File: '/etc/cron.daily/60sa-update'
    Size: 448 Blocks: 8 IO Block: 4096 regular file
    Device: 40f0b681h/1089517185d Inode: 409504 Links: 1
    Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2020-01-23 13:44:39.000000000 +0700
    Modify: 2020-01-23 13:44:39.000000000 +0700
    Change: 2020-01-27 20:52:40.452451535 +0700
    Birth: -


    # plesk -v
    Product version: Plesk Obsidian
    OS version: CentOS 7.5.1804 x86_64


    Comment actions Permalink
  • Avatar
    Jair Cueva Junior

    In Ubuntu 18.04 this wrapper scripts was failing due to a slightly difference in the path of "nice" command, it should be: /usr/bin/nice

    Thanks for exposing this approach, it has opened my mind for possibilities!

    Comment actions Permalink
  • Avatar
    Nelson Leiva

    Hi Jair Cueva Junior,

    The article has been updated. Thank you for noticing that!

    Comment actions Permalink
  • Avatar
    Michael Fryd

    When importing large email accounts for new clients, my server was bogged down, even with sa-learn running with low priority. 

    We don't see single instances of sa-learn using a lot of resources, we see many instances of sa-learn running sequentially.

    My solution was to write a script that checks the system load average.  If the load is getting high, it pauses before running sa-learn.  This helps keep the load from getting too high.  If the load doesn't come down, or the load is too high, it doesn't bother running sa-learn.  This seems to be OK. I suspect the message that were skipped will be examined on a subsequent run.


    Here's my replacement for /usr/bin/sa-learn:


    # replacement for /usr/bin/sa-learn
    # hack to keep sa-learn process from using too many resources
    # created 6/26/2020
    # PLESK periodically calls /usr/bin/sa-learn to train spam assassin on recent mail.
    # if large amounts of new mail are present (perhaps a large external email
    # account was recently imported) the i/o burden can bog down the server. We have
    # seen load averages above 300.
    # our solution is multi-pronged.
    # first, we simply don't run sa-learn if the load avergage is too high.
    # secondly, if the load average is starting to get high, we pause a little bit before
    # starting sa-learn
    # finally, if we do start sa-learn, we use nice to run it at a low priority.
    # We have moved the original sa-learn script to /usr/bin/sa-learn.orig
    # Note: we truncate the load average and look at only the integer portion.
    # -Michael Fryd (michael@fryd.com) 6/26/2020

    PUNT_LOAD_LIMIT=4 # If the load is higher than this we exit and don't run sa-learn.orig
    WAIT_LOAD_LIMIT=3 # if the load is higher than this, but below PUNT_LOAD_LIMIT, we delay before running
    SLEEP_TIME=0.4 # how long to delay (in seconds)
    MAX_ATTEMPTS=2 # max number of times to wait for load to come down before punting

    while [ "$count" -lt "$MAX_ATTEMPTS" ]; do
    # get the integer portion of the current 1 minute load average
    load=$(uptime | awk -F: '{print $5}' | awk -F"," '{print $1}' | tr -d '[:space:]' | cut -c1) # Poll Load

    # if the load is greater than the punt limit, give up. Don't wait, and no
    # additional passes
    if [[ "$load" -gt "$PUNT_LOAD_LIMIT" ]]; then
    exit 0; # give up, don't run sa-learn.orig at this time

    # if the load is low, then run sa-learn.orig at low priority, otherwise pause, then loop again
    if [[ "$load" -lt "$WAIT_LOAD_LIMIT" ]]; then
    /bin/nice -n 19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig $@
    exit 0; # If successful
    sleep $SLEEP_TIME;
    (( count++ ))

    Comment actions Permalink
  • Hi Michael Fryd,

    Thanks for such useful script whenever a mail importing is happening, it may be useful for other users.

    Comment actions Permalink
  • Avatar
    Alex Presland (Edited )

    Michael Fryd I had my server perform a DOS attack on itself this morning due to this very issue. I ended up renaming /usr/bin/sa-learn so that the requests failed.  I had 191 of them running at one point, with each taking 0.4% of the server's memory. Linux protected itself by killing off other processes (DNS, Database, etc.).  Very not good and a reboot was required to bring everything back online again in a controlled way.

    Jul 22 11:43:38 hosting dovecot: imap: Error: /var/qmail/popuser/warden-learn-ham.sh: 3: /usr/bin/sa-learn: not found

    These were invoked by a Warden script, and I believe that the functionality you have in your script should exist in the warden-learn-ham.sh and warden-learn.spam.sh scripts to protect servers.  Possibly also counting the number of sa-learn processes running and having a limit on the number allowed to run simultaneously.

    I will soon deploy your script, so thank you for your contribution to the community.

    Comment actions Permalink
  • Avatar
    Michael Fryd

    There's a bug in my script above.  It doesn't properly compute the load average if the server has been up for a very short, or a very long time.

    On my server, I have have removed the line:

        load=$(uptime | awk -F: '{print $5}' | awk -F"," '{print $1}' | tr -d '[:space:]' | cut -c1) # Poll Load

    and replaced it with

        load=$(awk  '{print $1}'  /proc/loadavg | tr -d '[:space:]' | awk -F \. '{print $1}') # Poll Load

    The original version of the script determined the load average by parsing the output of the "uptime" command.   However, the parsing wasn't smart enough to deal with the human readable uptime (the number of colons in the output varies on whether it is reporting uptime in hours or days).  The new version uses the /proc/loadavg file, which has a consistent and easily parseable format.


    I've been running with that change for over a year, with no issues.

    I do occasionally see 95%+ CPU usage, but these are low priority processes, and the load average stays around 3.   I no longer have my server choking from too many sa-learn processes.

    I don't think it's necessary for the script to count how many sa-learn processes are running.  The script dynamically limits the number to keep the load average within the target.  If the server is busy with other tasks, fewer sa-learn processes can run, as the other tasks will keep the load average up.


    Comment actions Permalink

Please sign in to leave a comment.

Have more questions? Submit a request