- Plesk Onyx for Linux
- Plesk Onyx for Windows
How does site audit work in SEO Toolkit?
The backend of the Site Audit is a recursive crawler that analyzes HTML Pages, spots common SEO errors, finds outgoing Links (internal and external) and continues to crawl those URLs in a Loop until either all URLs are crawled or the License based Limit is reached (250 internal HTML URLs for the free license).
The Start-URL for this Crawl is built like that
From there the crawler first analyzes thee
/robots.txt (if present). The crawler follows the Rules in the
/robots.txt for the “*” directive or for its specific User-Agent “XoviOnpageCrawler”.
It also adheres to “Crawl-Delay” directives that may be present in a
Below is a flow diagram that shows on a high level how the crawler recursively scans a site:
If a User starts multiple Site Audit Scans at the same time, they are going to be queued up.
Only one Domain can be scanned at the same time.
A “normal” Site Audit (without a robots.txt crawl-delay) with a Limit of 250 URLs typically takes around 6-8 Minutes. The time a Site Audit may take depends on a lot of factors:
- Crawl-Delay in /robots.txt
- Server Response times
- The number of Links that are on each page (many links result in a longer runtime due to the 2d Rank calculation)
As you can see from the image above, the Site Audit is split into two stages:
Some errors can be identified during the crawl by simply analyzing the source code of a URL.
Other errors require more complicated processing like e.g. finding pages with similar/duplicate content or calculating a 2D Rank (which is a combination of PageRank and CheiRank) across the whole link profile of a crawl.
The simple errors are identified during the crawl and the more complicated errors are identified during the postprocessing.