Whenever I take over a website to help with improving the organic rankings, I always take a look at what Google has indexed and is currently displaying in search results for the website. In almost all case, if it’s a Wordpress site, the search will show the theme folder has been indexed along with all the files in wp-content, wp-admin, and cgi-bin folder. Needless to say, Google, MSN, Yahoo, etc. has now indexed redundant files that have nothing to do with your website and in most cases this opens up possible security issues and can cause a hit to your organic search results by indexing these files.
The sad part is that all this can be prevented by the use of a good Robots.txt file that you can upload right to the root directory of your website. Takes a few minutes to do, and then you have to wait for the search engines to update your website indexing.
How to make a Great Robots.txt File.
- Open up Notepad ( Start menu - Programs - Accessories - Notepad )
- Copy the code at the bottom of this number list into the notepad
- Scroll down to the bottom of the copied code and change the link to the sitemap to reflect your sitemap.xml location.
- Save the file as robots.txt
- Upload / Replace your old robots.txt with the new robots.txt file in the root directory of your Wordpress website
Robots.txt Code
Note - If you have altered any directories that are now using uppercase instead of lowercase I have seen times where search bots will index these directories even though they are in the robots.txt because they are listed in lowercase and not uppercase.
I also do not care for the duggmirror to index my website when a post makes it to Digg, so I have added it a dissallow in the robots.txt.
User-agent: *
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-content/b
Disallow: /wp-content/p
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: */trackback/
User-agent: Googlebot
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: */trackback*
Disallow: /z/
Disallow: /wp-*
Allow: /wp-content/uploads/
User-agent: Googlebot-Image
Allow: /*
User-agent: Mediapartners-Google*
Allow: /z/
Allow: /about/
Allow: /contact/
Allow: /wp-content/
Allow: /tag/
Allow: /manual/*
Allow: /docs/*
Allow: /*.js$
Allow: /*.inc$
Allow: /*.css$
Allow: /*.gz$
Allow: /*.cgi$
Allow: /*.wmv$
Allow: /*.cgi$
Allow: /*.xhtml$
Allow: /*.php*
Allow: /*.gif$
Allow: /*.jpg$
Allow: /*.png$
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
Sitemap: http://www.yourwebsite.com/sitemap.xml
This robots.txt file was not created exclusively by myself, but for the life of me I can not remember the website where I picked this up. If anyone happens to know please tell me so I can include a link back to them for this amazing robots.txt file.


6 Comments
Wait a second, robots.txt does not have Allow command only Disallow right?
[Reply]
I am 100% certain it has an allow command, like I said, it’s helps having a good robots.txt in place.
[Reply]
In the global disallow area various directories and images are off limits, but I want Google imagebot to index images, and sometimes Googles Mediapartners follow the rules set in place for Google bot, unless you have noted it some where else in the text file.
[Reply]
Allow command is not a standard command, yet some search engines follow that rule, like Googlebot. It is not necessary thought as Allow command is default to all search engines.
[Reply]
User-agent: Mediapartners-Google*
Disallow:
This would do exactly the same thing as your long list of non-standard “Allow:” directives following Mediapartners-Google*
[Reply]
I really like what you had to say here! It\’s about time! Would you mind if I placed a link back from my blog?
[Reply]
One Trackback
Wordpress Robot.txt - The Correct Way…
Google will show the theme folder has been indexed along with all the files in wp-content, wp-admin, and cgi-bin folder unless a good Robots.txt is in place. Here is a tutorial for robots.txt file….