What is crawl delay?
TL;DR
Crawl delay is an unofficial robots.txt directive that tells web crawlers how many seconds to wait between page requests. It prevents server overload by spacing out crawler visits, though major search engines like Google ignore this directive entirely. Bing and Yandex support crawl delay with different interpretations, making it an inconsistent but sometimes useful tool for managing crawler traffic on resource-limited servers.
What Is Crawl Delay?
Crawl delay is a time-based directive in robots.txt files that instructs crawlers to pause between successive requests. The directive appears as Crawl-delay followed by a number representing seconds. Website owners use this to throttle aggressive crawlers that might overwhelm server resources.
The directive enforces politeness by spacing crawler requests, protecting servers from excessive load. However, its unofficial status means crawler support varies widely with different interpretations across search engines.
How Different Crawlers Interpret Crawl Delay
Google ignores the crawl delay directive completely. Google crawlers determine their own request rates based on server response times and don’t honor robots.txt timing instructions. Website owners must use Google Search Console to adjust Googlebot’s crawl rate instead.
Bing interprets crawl delay as time windows. A setting of Crawl-delay 10 creates ten-second windows where Bingbot crawls maximum one page per window. This effectively limits Bing to approximately 8,640 pages daily. Bing also provides crawl rate controls through Bing Webmaster Tools.
Yandex treats the number as minimum seconds between requests. Setting Crawl-delay 10 means Yandex waits at least ten seconds before requesting the next URL. Like Bing, Yandex offers webmaster tools for managing crawl rates that override robots.txt settings.
When to Use Crawl Delay
Use crawl delay on resource-limited shared servers where crawler traffic causes performance issues. If multiple crawlers simultaneously accessing your site leads to slowdowns, crawl delay provides temporary relief by spreading requests over longer periods.
Sites experiencing server stress can throttle non-essential crawlers while allowing trusted bots unrestricted access. This prioritizes crawl budget for search engines driving actual traffic. Avoid crawl delay on sites needing fast indexing, as delays mean fewer pages crawled daily.
Alternatives and Better Solutions
Upgrade server resources to handle crawler traffic efficiently rather than using crawl delay. Better hosting with adequate bandwidth eliminates the need for request throttling.
For Google, return HTTP 503 or 429 status codes temporarily during server issues, or file overcrawling reports through Google Search Console. Use webmaster tools from major search engines to set crawl rates centrally with more reliable results than robots.txt directives.
Key Takeaways
Crawl delay is an unofficial robots.txt directive specifying wait time between crawler requests, designed to prevent server overload. Major search engines interpret it inconsistently with Google ignoring it completely, Bing using time windows, and Yandex treating it as minimum wait time. Use crawl delay only on resource-limited servers experiencing genuine crawler-related performance issues. Better solutions include upgrading hosting, using search engine webmaster tools for official crawl rate controls, or returning appropriate HTTP status codes during server stress. For most modern websites with adequate hosting, crawl delay is unnecessary and potentially harmful to indexing speed.
data from the web