Applicable to:
- Plesk for Linux
Symptoms
High CPU usage for sa-learn
and spamtrain
processes in the Plesk server.
Cause
Since the sa-learn
script is used by spamassassin
to learn the specific properties of spam from a folder that supposedly contains only spam emails.
CPU usage of the sa-learn
process depends on the number of emails in mailboxes and their size. In turn, if the number of emails is huge - CPU over usage is expected.
Resolution
To work around the issue, use one of the following solutions:
-
Connect to the server using SSH.
-
Use nice or ionice utilities to decrease the priority of
sa-update
from the cron tasks. -
Create a wrapper file which will execute
sa-learn
with the lowest CPU priority:# cp -a /usr/bin/sa-learn{,.orig}
# echo "/bin/nice 19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig \$@" > /usr/bin/sa-learn
# chmod +x /usr/bin/sa-learnNote: In Debian based OS the correct path is
/usr/bin/nice
Set the lowest priority to the currently running sa-learn
processes execute the following:
# ps auxwf | grep -v grep | grep sa-learn | awk {'print $2'} | xargs -i renice 19 {}
Note: Process priority values range from -20 to 19. A process with the nice value of -20 is considered to be on top of the priority. And a process with nice value of 19 is considered to be low on the priority list.
Disable the sa-update
execution by moving 60sa-update
and spamassassin
files from /etc/cron.daily
to a different location:
# mv /etc/cron.daily/60sa-update /root
Comments
12 comments
Are you sure that we need to modify 60sa-update in order to change high cpu from sa-learn?
This pstreelog output (C7, P17.8.x) would suggest otherwise:
|-anacron---run-parts-+-50plesk-daily---sw-engine---sw-engine---spamtrain---su---sa-learn
I couldnt find any mentioning of sa-learn in sa-update either. Please advise.
@Alexander Bien
Thank you for the feedback. The article has been updated.
On CentOS
echo "/bin/nice -19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig $@" > /usr/bin/sa-learn
must be executed as
echo "/bin/nice -19 /usr/bin/perl -T -w /usr/bin/sa-learn.orig \$@" > /usr/bin/sa-learn
because else, the "$@" will not be exported into /usr/bin/sa-learn.
Peter, article updated! Thank you for report!
Trying to disable and get the error:
mv /etc/cron.daily/60sa-update /root
mv: cannot stat ‘/etc/cron.daily/60sa-update’: No such file or directory
Hello Bbennett
Did you execute this command as a root user?
The 60sa-update exists on my test server and the issue wasn't reproduced:
# stat /etc/cron.daily/60sa-update
File: '/etc/cron.daily/60sa-update'
Size: 448 Blocks: 8 IO Block: 4096 regular file
Device: 40f0b681h/1089517185d Inode: 409504 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-01-23 13:44:39.000000000 +0700
Modify: 2020-01-23 13:44:39.000000000 +0700
Change: 2020-01-27 20:52:40.452451535 +0700
Birth: -
# plesk -v
Product version: Plesk Obsidian 18.0.23.1
OS version: CentOS 7.5.1804 x86_64
In Ubuntu 18.04 this wrapper scripts was failing due to a slightly difference in the path of "nice" command, it should be: /usr/bin/nice
Thanks for exposing this approach, it has opened my mind for possibilities!
Hi Jair Cueva Junior,
The article has been updated. Thank you for noticing that!
When importing large email accounts for new clients, my server was bogged down, even with sa-learn running with low priority.
We don't see single instances of sa-learn using a lot of resources, we see many instances of sa-learn running sequentially.
My solution was to write a script that checks the system load average. If the load is getting high, it pauses before running sa-learn. This helps keep the load from getting too high. If the load doesn't come down, or the load is too high, it doesn't bother running sa-learn. This seems to be OK. I suspect the message that were skipped will be examined on a subsequent run.
Here's my replacement for /usr/bin/sa-learn:
Hi Michael Fryd,
Thanks for such useful script whenever a mail importing is happening, it may be useful for other users.
Michael Fryd I had my server perform a DOS attack on itself this morning due to this very issue. I ended up renaming /usr/bin/sa-learn so that the requests failed. I had 191 of them running at one point, with each taking 0.4% of the server's memory. Linux protected itself by killing off other processes (DNS, Database, etc.). Very not good and a reboot was required to bring everything back online again in a controlled way.
Jul 22 11:43:38 hosting dovecot: imap: Error: /var/qmail/popuser/warden-learn-ham.sh: 3: /usr/bin/sa-learn: not found
These were invoked by a Warden script, and I believe that the functionality you have in your script should exist in the warden-learn-ham.sh and warden-learn.spam.sh scripts to protect servers. Possibly also counting the number of sa-learn processes running and having a limit on the number allowed to run simultaneously.
I will soon deploy your script, so thank you for your contribution to the community.
There's a bug in my script above. It doesn't properly compute the load average if the server has been up for a very short, or a very long time.
On my server, I have have removed the line:
load=$(uptime | awk -F: '{print $5}' | awk -F"," '{print $1}' | tr -d '[:space:]' | cut -c1) # Poll Load
and replaced it with
load=$(awk '{print $1}' /proc/loadavg | tr -d '[:space:]' | awk -F \. '{print $1}') # Poll Load
The original version of the script determined the load average by parsing the output of the "uptime" command. However, the parsing wasn't smart enough to deal with the human readable uptime (the number of colons in the output varies on whether it is reporting uptime in hours or days). The new version uses the /proc/loadavg file, which has a consistent and easily parseable format.
I've been running with that change for over a year, with no issues.
I do occasionally see 95%+ CPU usage, but these are low priority processes, and the load average stays around 3. I no longer have my server choking from too many sa-learn processes.
I don't think it's necessary for the script to count how many sa-learn processes are running. The script dynamically limits the number to keep the load average within the target. If the server is busy with other tasks, fewer sa-learn processes can run, as the other tasks will keep the load average up.
Please sign in to leave a comment.