The following SEO related commands “Disallow”, “Noindex” and “Nofollow” you have most probably heard of but do you genuinely know the difference and when each should be used to its best advantage? Probably not, but you are in the right place! These commands in their simplest terms can be used to order search engines around, which can be quite fun but more importantly dramatically improve your SEO if used correctly. That said however, if used incorrectly or even forgotten about, they can have a seriously drastic impact upon your website and SEO – please take caution when using the Disallow, Noindex and Nofollow commands!
Fundamentally, these commands can be defined as;
- Disallow – this command tells search engines no to crawl your page(s)
- Noindex – this command tells search engines not to index your page(s) i.e. not appear in search engine results
- Nofollow – this command tells search engines to crawl your pages, but not to follow any links within the page
Firstly, there are two areas of a website that are relevant to the Disallow, Noindex and Nofollow commands; the robots.txt file and the HTML code of each page.
The Robots.txt File
The Robots.txt file is created by webmasters in order to instruct search engines on to how to crawl their website. The robots.txt file is part of the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and subsequently serve that content up to users. For the purpose of this post, it is only the “Disallow” or “Noindex” command that would be placed in the Robots.txt file.
Within this file, you will also see “User-Agent” – this refers to the specific search engine robots (or web crawler software) you would like the preceding commands to apply to. There are over 300 over these, but the most prominent are obviously Googlebot (Google), Bingbot (Bing), Slurp (Yahoo). Using, Googlebot, your Robots.txt file would look like this;
All robots.txt files appear like this in the URL: www.apple.com/robots.txt
“Disallow” commands simply tell search engines not to crawl particular pages and are as standard formatted as follows;
These three “Disallow” commands ensure the URL /folder/, file.html and image.png will not be crawled by search engines. A word of caution here though, as this command is essentially the same as as completely removing this webpage from you website.
The “Noindex” command tells search engines to crawl a page, but to not index that particular page so it does not appear in search results. This can be either added to the robots.txt file (a quicker method to index a lot of pages) or the HTML code of the page.
Noindex in Robot.txt
To Noindex pages in the Robot.txt file you would use the following command;
It is recommend however that this method is not used as it is not overly reliable. Therefore the other method to Noindex a page is in the HTML code of that page itself.
Noindex in Pages
The most common method to Noindex pages by far is to add the tag into the ‘head’ section of the HTML, or alternatively in the response headers. It’s important to ensure that this page is not already disallowed and included in the Robots.txt file. If it is already blocked, the search bot will not even see your Noindex tag. The command to use to use is as follows;
<meta name=”robots” content=”noindex,follow”>
The preceding part of this tag is “follow”. This instructs the bot to continue following (discovering) the links on that page i.e. continue to navigate through all of that pages links to other pages. Alternatively this could be changed to “Nofollow” which would instruct search engine bots to not index the page, but also to NOT follow through any links on that page. We will come on to “Nofollow” in the next section.
Example Pages to Noindex;
- Privacy Policies
- Thank-you Pages
- Member Only Pages
- Admin Directories
- Plugin Directories (If using WordPress etc.)
This command instructs search engines to not use a link to decide on the importance of the linked page (PageRank), or navigate through (discover) more URLs within the same site via the links on that page.
Nofollow commands can be added in two places in the pages HTML;
- The <Head> section of the page (Like Noindex) in order to Nofollow every single link on that particular page. This would come in the form;
- <meta name=”robots” content=”nofollow” />
- If you wish to Nofollow individual links on a page, the following code can be used;
- <a href=marketing-insights.html” rel=”nofollow”>marketing insights</a)
It’s important to remember, Nofollow commands whether in the ‘Head’ section or applied to particular links, will not stop those particular pages from being crawled and indexed, they simply stop crawlers navigating to it and ranking particular pages from those specific links.
Example Pages to Nofollow;
- Paid Links
- Comment Sections (Blogs)
- User-Generated Content
- Embedded Items from other sites
- Other – simply use it any time you don’t want your site to be seen as endorsing a link to another site
I hope that has cleared things up regarding the Disallow, Noindex and Nofollow tags. If you have any questions, please don’t hesitate to get in touch. It’s a daunting area we know, and certainly one you do not want to get wrong.