Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • How to Create a Robots.txt File for Your Website

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 3.47k
    Comment on it

    Definition of ROBOTS.TXT:

     

    A Robots.txt file stops Search Engine crawlers from indexing those web pages which do not add any business value to your website. In other words, it instructs Search Engines bots how to exactly crawl and index your website pages’

    Source: Seo Topper Youtube tutorial

     


     

    Robots.txt helps the major search engines, like Google, Bing, Yahoo to  properly index webpages using a robots.txt file.

     

    We can utilize the /robots.txt file to give set of instructions about our website domain to web robots which is known as ‘The Robots Exclusion Protocol’.

     

    Basically, it acts in a way that when a robot visits a website domain, for example http://www.website.com/page.html, it first checks for any instruction at http://www.website.com/robots.txt, and then goes ahead to crawl the website.

     

    Usually, this robot file allows all major search engines to crawl and index the whole website content, but some need privacy so we can exclude, disallow them to crawl with search engine Bot.

     

    Usually, there the same folder in our website domain which we not wish visit by   with the help of search engines. Example of some of the folders are.

    • Pages which contain login details

    • Privacy Policy Pages

    • Copyright Image folders

    • Private Media files

    • Internal Pages

    • Contact Pages

    • Print files


     

    A Perfect example of robots.txt File:

     

    
    # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
    
    #
    
    # To ban all spiders from the entire site uncomment the next two lines:
    
    # User-Agent: *
    
    # Disallow: /
    
    User-agent: *
    
    Disallow: /account
    
    Disallow: /cache/
    
    Disallow: /components/
    
    Disallow: /installation/
    
    Disallow: /language/
    
    Disallow: /libraries/
    
    Disallow: /tmp/

     

    In the above example

    ‘User-agent’:
    It means Search Engine Bot or spider or  Crawler.

    ‘*’ (Asterisk) :

    It represent that it is for all search Engines Bots or spiders. There many type of crawler of major search engine. Also in case Google on there are mainly five type of crawlers: One is Googlebot, second is Googlebot-Mobile, third is Googlebot-Image, fourth is Mediapartners-Google and fifth is Adsbot-Google.

    Googlebot, Yahoobot and Bingbot are some of the major bot of Search Engines.

     

    Disallow: Do not crawl any page

    Allow: Can crawl and index all the pages

    And don’t forget to put a ‘/’ (forward slash) in front of the colon(:). Otherwise it will work in just an opposite way.

     

    I hope this blog helps you in creating the robots.txt file for your website. If you have any query, please do post in the comment box below.

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: