What is a robots.txt file?

There may be areas of your site you do not want crawled by the search engine spiders, such as the admin area of the site or a test page. One way to tell search engines the files and folders to avoid is through the robots meta tag. However, not all search engines read metatags and therefore webmasters use the robots.txt file to tell search engines the areas of the site to avoid.

Link from:http://www.flickr.com/photos/microcosmos/1265783338/

What is a robots.txt file?
The robots.txt is a text file placed in the root folder of a website (for example: www.example.com/robots.txt).

Why is it used?

To give instructions about the websites to search engine spiders. The robots.txt contains information about the pages that should not be crawled. It also contains the location of the XML sitemap. A lot of people use the robots.txt file to stop the search engine from crawling a page or number of pages, for example if you are still building the site and do not want it to appear in search engines

What does it look like?

User-agent: *
Disallow: /admin
Disallow: /enquiry-form/
Disallow: /shoppingbasket/

The “*” means any robot. Each part of the site that you do not want the robot to crawl you have to put on a separate line.

Although you may have set up the robots.txt file, it does not mean that all robots will respect the file. Robots can ignore it. The file is publicly available, so anyone can see parts of your site you do not want the robots to use.

If you want to find out more information about robots file, here are some useful links:

The Web Robots Pages
Create a robots.txt file – Search Console Help

Leave a Reply Cancel reply