We have a client we have been carrying out search engine
optimisation work for over the last 3 months. However, recently we noticed that
they had a bit of an issue with their robots.txt
Now for those who don’t know, a robots.txt file
sits in the root directory of your website and it is there to tell search
engines which files they should and should NOT index (The only real exception
to this rule is that some search engines also allow you to provide the address
of your XML Sitemap in this file)
Note: placing a file or directory to exclude in your robots.txt file is no guarantee that these pages will not be indexed by search engines. It is merely a way to indicate that you don’t want the page appearing in Google, Bing/Yahoo, etc. … but it still might.
Note: placing a file or directory to exclude in your robots.txt file is no guarantee that these pages will not be indexed by search engines. It is merely a way to indicate that you don’t want the page appearing in Google, Bing/Yahoo, etc. … but it still might.
Anyhow, back to this problematic little robots.txt file –
here it is, in all its glory:
User-agent: *
Crawl-Delay: 10
Disallow:
Crawl-Delay: 10
Disallow:
The main issue here
is with the command “Crawl-Delay: 10”
Note: This was put in automatically by the client’s eCommerce application and not manually by the client.
Note: This was put in automatically by the client’s eCommerce application and not manually by the client.
It is important
to know that the “Crawl-Delay” command is not a representation of crawl rate
(e.g. the amount of pages indexed at any one time), but instead it defines the
amount of time (from 1 to 30 seconds) that the search engine "bot"
will wait between crawling each and every page of your site. Meaning that the
higher the figure, the lesser the number of pages on your site that actually get
indexed.
The original
purpose of this command was to stop search engine spiders from tearing through
a large site and having a performance effect (e.g. too many crawls at once
could even bring a site to its knees). However, Google publically states that
they do not support the crawl delay command,
(although Bing does), so the case for actually having this command there in the
first place is significantly reduced.
Moreover, this crawl delay command
poses an issue for some search engine optimisation services, such as Raven
Tools. Which can’t effectively crawl sites when this single line is present.
So my advice…. Unless you have strange and less common search engine spiders frequently visit and they have a noticeable effect on your site’s stability, I would remove this command if at all possible.