Applicable to:
- Plesk for Linux
Question
How to block indexing by search bots for multiple websites in Plesk for Linux?
Answer
You may achieve this in the following ways:
Apply the following nginx web server directives to all domains by using the steps in this article:
CONFIG_TEXT: if ($http_user_agent ~ SputnikBot|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|CommentReader|meanpathbot|statdom|proximic|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|masscan|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|PaperLiBot|getprismatic|360Spider|Ahrefs|ApacheBench|Aport|Applebot|archive|BaiduBot|Baiduspider|Birubot|BLEXBot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|discobot|DomainTools|DotBot|Exabot|Ezooms|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|HTTrack|ia_archiver|InternetSeer|Jakarta|Java|JS-Kit|km.ru|kmSearchBot|Kraken|larbin|libwww|Lightspeedsystems|Linguee|LinkBot|LinkExchanger|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|lwp-trivial|majestic|Mediatoolkitbot|MegaIndex|MetaURI|MJ12bot|MLBot|NerdByNature|NING|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|pirst|PostRank|crawler|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|rogerbot|Ruby|SearchBot|SemrushBot|SISTRIX|SiteBot|Slurp|Sogou|solomono|Soup|spbot|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|Yeti|YottosBot|Zeus|zitebot|ZmEu|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|MJ12bot|Twiceler|Baiduspider|Java|CommentReader|Yeti|discobot|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Teleport|Offline|DISCo|netvampire|Copier|HTTrack|WebCopier) {
return 444;
}
Add the following nginx web server additional directives to all domains that for which you would like to block search bots by using the steps in this article:
CONFIG_TEXT: if ($http_user_agent ~ SputnikBot|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|CommentReader|meanpathbot|statdom|proximic|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|masscan|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|PaperLiBot|getprismatic|360Spider|Ahrefs|ApacheBench|Aport|Applebot|archive|BaiduBot|Baiduspider|Birubot|BLEXBot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|discobot|DomainTools|DotBot|Exabot|Ezooms|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|HTTrack|ia_archiver|InternetSeer|Jakarta|Java|JS-Kit|km.ru|kmSearchBot|Kraken|larbin|libwww|Lightspeedsystems|Linguee|LinkBot|LinkExchanger|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|lwp-trivial|majestic|Mediatoolkitbot|MegaIndex|MetaURI|MJ12bot|MLBot|NerdByNature|NING|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|pirst|PostRank|crawler|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|rogerbot|Ruby|SearchBot|SemrushBot|SISTRIX|SiteBot|Slurp|Sogou|solomono|Soup|spbot|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|Yeti|YottosBot|Zeus|zitebot|ZmEu|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|MJ12bot|Twiceler|Baiduspider|Java|CommentReader|Yeti|discobot|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Teleport|Offline|DISCo|netvampire|Copier|HTTrack|WebCopier) {
return 444;
}
As an alternative, a robots.txt
file can be created (for each domain that needs to block search bots) in the domain's document root directory (by default it is /var/www/vhosts/example.com/httpdocs/
) and this file will disallow access to bots for the whole website. The code that must be placed within the robots.txt
file in order to achieve this is the following:
CONFIG_TEXT: User-agent: *
Disallow: /
Comments
0 comments
Please sign in to leave a comment.