Applicable to:
- Plesk for Linux
Question
How to block indexing by search bots for all / multiple websites in Plesk?
Answer
You may achieve this in the following ways:
Warning: All existing nginx directives will be overwritten for these domains by the script. Make sure existing nginx directives can be safely overwritten.
-
Connect to your Plesk server via SSH.
-
Create a temporary file called directive_template with the configuration below - it will reject requests from known bots:
CONFIG_TEXT: if ($http_user_agent ~ Googlebot|SputnikBot|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|CommentReader|meanpathbot|statdom|proximic|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|masscan|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|PaperLiBot|getprismatic|360Spider|Ahrefs|ApacheBench|Aport|Applebot|archive|BaiduBot|Baiduspider|Birubot|BLEXBot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|discobot|DomainTools|DotBot|Exabot|Ezooms|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|HTTrack|ia_archiver|InternetSeer|Jakarta|Java|JS-Kit|km.ru|kmSearchBot|Kraken|larbin|libwww|Lightspeedsystems|Linguee|LinkBot|LinkExchanger|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|lwp-trivial|majestic|Mediatoolkitbot|MegaIndex|MetaURI|MJ12bot|MLBot|NerdByNature|NING|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|pirst|PostRank|crawler|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|rogerbot|Ruby|SearchBot|SemrushBot|SISTRIX|SiteBot|Slurp|Sogou|solomono|Soup|spbot|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|Yeti|YottosBot|Zeus|zitebot|ZmEu|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|MJ12bot|Twiceler|Baiduspider|Java|CommentReader|Yeti|discobot|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Teleport|Offline|DISCo|netvampire|Copier|HTTrack|WebCopier) {
return 444;
} -
Create a list with the names of all domains:
# plesk bin site -l > domains_list
-
Execute this command to apply the new nginx configuration to all domains:
# while read -r domain; do install directive_template -o root -g nginx -m 600 "/var/www/vhosts/system/${domain}/conf/vhost_nginx.conf"; plesk sbin httpdmng --reconfigure-domain "${domain}" -no-restart; done < domains_list && service nginx reload
Add the following nginx web server additional directives to all domains that for which you would like to block search bots by using the steps in this article:
CONFIG_TEXT: if ($http_user_agent ~ Googlebot|SputnikBot|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|CommentReader|meanpathbot|statdom|proximic|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|masscan|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|PaperLiBot|getprismatic|360Spider|Ahrefs|ApacheBench|Aport|Applebot|archive|BaiduBot|Baiduspider|Birubot|BLEXBot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|discobot|DomainTools|DotBot|Exabot|Ezooms|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|HTTrack|ia_archiver|InternetSeer|Jakarta|Java|JS-Kit|km.ru|kmSearchBot|Kraken|larbin|libwww|Lightspeedsystems|Linguee|LinkBot|LinkExchanger|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|lwp-trivial|majestic|Mediatoolkitbot|MegaIndex|MetaURI|MJ12bot|MLBot|NerdByNature|NING|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|pirst|PostRank|crawler|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|rogerbot|Ruby|SearchBot|SemrushBot|SISTRIX|SiteBot|Slurp|Sogou|solomono|Soup|spbot|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|Yeti|YottosBot|Zeus|zitebot|ZmEu|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|MJ12bot|Twiceler|Baiduspider|Java|CommentReader|Yeti|discobot|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Teleport|Offline|DISCo|netvampire|Copier|HTTrack|WebCopier) {
return 444;
}
As an alternative, a robots.txt
file can be created (for each domain that needs to block search bots) in the domain's document root directory (by default it is /var/www/vhosts/example.com/httpdocs/
) and this file will disallow access to bots for the whole website. The code that must be placed within the robots.txt
file in order to achieve this is the following:
CONFIG_TEXT: User-agent: *
Disallow: /
Comments
0 comments
Please sign in to leave a comment.