• WordPress Robot.txt – The Correct Way

    HowBits.com Offers Simplified Print Options For Articles.

    Print

    Whenever I take over a website to help with improving the organic rankings, I always take a look at what Google has indexed and is currently displaying in search results for the website. In almost all case, if it’s a WordPress site, the search will show the theme folder has been indexed along with all the files in wp-content, wp-admin, and cgi-bin folder. Needless to say, Google, MSN, Yahoo, etc. has now indexed redundant files that have nothing to do with your website and in most cases this opens up possible security issues and can cause a hit to your organic search results by indexing these files.

    The sad part is that all this can be prevented by the use of a good Robots.txt file that you can upload right to the root directory of your website. Takes a few minutes to do, and then you have to wait for the search engines to update your website indexing.

    How to make a Great Robots.txt File.

    1. Open up Notepad ( Start menu – Programs – Accessories – Notepad )
    2. Copy the code at the bottom of this number list into the notepad
    3. Scroll down to the bottom of the copied code and change the link to the sitemap to reflect your sitemap.xml location.
    4. Save the file as robots.txt
    5. Upload / Replace your old robots.txt with the new robots.txt file in the root directory of your WordPress website

    Robots.txt Code

    Note – If you have altered any directories that are now using uppercase instead of lowercase I have seen times where search bots will index these directories even though they are in the robots.txt because they are listed in lowercase and not uppercase.

    I also do not care for the duggmirror to index my website when a post makes it to Digg, so I have added it a dissallow in the robots.txt.


    User-agent:  *
    Disallow: /cgi-bin/
    Disallow: /z/j/
    Disallow: /z/c/
    Disallow: /stats/
    Disallow: /dh_
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /contact/
    Disallow: /tag/
    Disallow: /wp-content/b
    Disallow: /wp-content/p
    Disallow: /wp-content/themes/
    Disallow: /trackback/
    Disallow: */trackback/

    User-agent: Googlebot
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.cgi$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: */trackback*
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/
     

    User-agent: Googlebot-Image
    Allow: /*
     
    User-agent: Mediapartners-Google*
    Allow: /z/
    Allow: /about/
    Allow: /contact/
    Allow: /wp-content/
    Allow: /tag/
    Allow: /manual/*
    Allow: /docs/*
    Allow: /*.js$
    Allow: /*.inc$
    Allow: /*.css$
    Allow: /*.gz$
    Allow: /*.cgi$
    Allow: /*.wmv$
    Allow: /*.cgi$
    Allow: /*.xhtml$
    Allow: /*.php*
    Allow: /*.gif$
    Allow: /*.jpg$
    Allow: /*.png$
     
    User-agent: ia_archiver
    Disallow: /
     
    User-agent: duggmirror
    Disallow: /

    Sitemap: http://www.yourwebsite.com/sitemap.xml


     

    This robots.txt file was not created exclusively by myself, but for the life of me I can not remember the website where I picked this up. If anyone happens to know please tell me so I can include a link back to them for this amazing robots.txt file.



    Tags: , , , , ,

6 Comments


  1. I really like what you had to say here! It\’s about time! Would you mind if I placed a link back from my blog?

  2. Jean-Luc says:

    User-agent: Mediapartners-Google*
    Disallow:

    This would do exactly the same thing as your long list of non-standard “Allow:” directives following Mediapartners-Google*

  3. SEO/SEM blog says:

    Allow command is not a standard command, yet some search engines follow that rule, like Googlebot. It is not necessary thought as Allow command is default to all search engines.

  4. Djames says:

    In the global disallow area various directories and images are off limits, but I want Google imagebot to index images, and sometimes Googles Mediapartners follow the rules set in place for Google bot, unless you have noted it some where else in the text file.

  5. Djames says:

    I am 100% certain it has an allow command, like I said, it’s helps having a good robots.txt in place.

  6. SEO/SEM blog says:

    Wait a second, robots.txt does not have Allow command only Disallow right?

Trackbacks/Pingbacks

  1. PlugIM.com

Leave a comment

Bad Behavior has blocked 2065 access attempts in the last 7 days.

Follow Me on Twitter