Duplicate content on your own website, where different web addresses ("URLs") on your site display identical content, can lead to problems in your website's performance in the search engine. This article discusses one way to resolve this for the Google search engine.
I have discussed the problems associated with having different URLs show the same content, or what webmasters often call "duplicate content", in my articles before.
In case it isn't clear to you how this can come about, take the example of a site selling a product called "Widget A".
The site links to a product page showing details of widget A using the URL
http://www.example.com/widget-A/". Like all sites with high usability, the webmaster also provides other ways
in which the visitor can end up on that product page. For example, if the visitor uses the site's "Help" function to
look for a product with certain features, the site may show information about Widget A that fulfills the visitor's
criteria. The information page may use a different address, like
http://www.example.com/help.php?features=fix+kitchen+sink". Both addresses show the exact same information,
since they are talking about the same product.
Even if you don't use scripts on your website, it's still possible to end up with duplicate content problems.
For example, the "index.html" page of a website or its directory is usually the same page displayed by a web server when
the visitor accesses the site without specifying a filename. That is, "
http://www.example.com/" are usually the same page, showing the same content. (For more information about this
behaviour, and its ramifications, see
"Should Your URLs
Point to the Directory or the Index Page?".)
When a page can be accessed with multiple web addresses, you run the risk of link dilution. I've mentioned this before in How to Create a Search Engine Friendly Website, so if you're not familiar with the term, please check that article out for details. In general, link dilution causes the relevant page on your site to rank less in the search engine results than it should had it not occurred.
To help webmasters solve this problem, Google have ("has" in US English) declared that it will recognize a new HTML / XHTML tag, which, if you insert into your web page, will allow you to state which URL you want to be the "official", or "canonical", address for that particular content.
This tag needs to be inserted into the
HEAD section of your web page. It has the following format:
http://www.example.com/correct-page.html" with your actual web address. Remember:
the code has to go into the
HEAD section of your web page where all the meta data are,
and not into the
BODY section where your content lives. If you use a
WYSIWYG web editor (where
WYSIWYG means "What You See Is What You Get"), change to the "Source" mode to locate the right section.
The canonical URL link tag will cause Google to take the web address you put into the tag as the "official" or "correct" version of your web address. If you have two URLs that resolve to the same content, Google will use the one declared as canonical as the actual URL. This means the following:
In search engine results, it will display the canonical URL instead of all the variants it finds on your website.
You will avoid the link dilution problem mentioned earlier. Links from other sites that point to your content using all its myriad URLs will be regarded as pointing to your canonical URL. That is to say, your page rank from all the diverse URLs will flow correctly to the page it's supposed to be attached to.
There are some limitations to what the new link tag can do.
The information about the canonical URL does not work across different domain names. However, it works across sub-domain names.
For example, if you have a URL like "
a.com/something" that is identical with
b.com/something-else", Google will not take your canonical url link tag on b.com to apply to a.com.
However, if you have URLs on multiple subdomains on your domain that show the same content, like "www.example.com/xyz.html", "my.example.com/whatever.html" and "example.com/index.html" all showing the same page, putting a canonical link tag will cause Google to accept the URL you put in your link tag as the real URL.
Update: The search engine now accepts cross-domain canonical tags. That is, this limitation no longer exists.
The tag is currently only recognized by Google. As such, you should still continue to find ways of reducing multiple URLs that lead to the same content on your website.
Update: some of the other search engines have said that they will support the canonical tag as well, although they may not necessarily give it the same weightage as Google. Nor will they necessarily support its use across different domains (see above point).
It's probably too early to say whether this will become the definitive method that helps webmasters solve the pesky duplicate content problem that plagues many sites. (The tag was only officially announced by Google on 12 February 2009.)
I personally think it is an ingenious solution, and it puts the power of how to resolve the issues into the hands of the webmasters themselves, rather than letting the search engine, which usually does not have enough information, try to figure out the correct URL. Hopefully, the other search engines will also recognize this tag, making this a problem of the past.
Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at https://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.
This article is copyrighted. Please do not reproduce or distribute this article in whole or part, in any form.
It will appear on your page as: