A great thing about Google is, it gives webmasters all the help they
need to get their websites into Google's index. There's a nice tool
available in Google Webmaster Tools called 'Fetch as GoogleBot'. This
tool, as we discussed in our SEO Tips for start-ups,
can be a great help diagnosing errors and getting a website in Google's
index faster. A robots.txt file is used for crawling efficiency, and
preventing certain pages from being crawled etc. Sometimes though,
GoogleBot might have difficulty fetching your robots.txt file. Here's a
solution from Google to this problem.
The original question asked o the GWT forum had to with crawling
inefficiency. The GoogleBot was unable to crawl a robots.txt file 50% of
the time, even though the file could be fetched from other hosts with a
100% success rate. It is worth noting that this was on a plain nginx
server and a mit.edu host, so that should have a pretty good up-time. So
the problem seems to be with Google, right?
Sometimes, people try cloaking on their websites. Cloaking means hiding
content from crawlers, so that different content is served to crawlers
and users. So what a user might see on their websites might be a lot
different than what crawlers such as GoogleBot see. Not only is this a
bad SEO practice, it can also have consequences.
During cloaking, people sometimes make a mistake, and end-up
reverse-cloaking. So while browsers and user agents see the website
fine, crawlers don't see any content. Making such a mistake is like
axing your own foot. So this could be one of the reasons to the problem.
As we discussed about at the start, the Fetch as GoogleBot feature in
Google Webmaster Tools is a pretty awesome tool. You can use it to fetch
your robots.txt file. t will tell you when there's a problem. Many
people might not know this, but sometimes, their web hosts might
alternate between different systems and hosts. So a 50% success rate
might be accounted for one of the hosts being improperly configured. You
might want to contact your hosting company about this.
These two could be the most probable causes for robot.txt crawling
errors. Did this help? Please do let us know. And stay tuned for more
SEO questions and their answers :)
0 blogger-facebook:
Post a Comment