canonical

Ask.com also supporting the new canonical tag

Number four search engine Ask has joined the other three major search engines in supporting the new “canonical tag” – this is great news for webmasters as they can now use this new search technology on all of the “Big Four” search engines.

Number four search engine Ask.com, formerly known as Ask Jeeves, have announced that they will be joining Google, Yahoo! and Microsoft in supporting the new canonical tag, a recent joint effort by the search engines to combat duplicate content. This is fantastic news for webmasters as it means that they can use the same technology on their website and it will work on all of the “Big Four” search engines.

Ask is the fourth biggest search engine, and the smallest of the “Big Four” (AOL is powered by Google so is not included as a search engine in its own right). Ask has a market share in the United States of between 3% and 4% according to web metrics companies Hitwise and comScore, and its UK market share is also around the 3% mark. Although this is fairly small it is still a reasonably significant number of users and they are not that far behind number three Microsoft in these markets.

Supporting shared standards is particularly important for a small player in the search engine marketplace as it essentially drives down the cost of doing business. Relatively few webmasters will implement solutions specific to small search engines, preferring to concentrate on the dominant market leader, Google. However, webmasters will be more than happy to implement solutions which also work across all search engines, as it allows them to support all of the smaller players in one fell swoop without having to create tailored solutions for each one.

Although Ask were a few days later than the other search engines with their announcement, this is much faster than many of their previous reactions to new Search standards. Ask.com added support for XML Sitemaps almost 2 years after Google and months after Yahoo! and Microsoft, and still haven’t announced support for the “nofollow” attribute for links (although Ask.com claim that their algorithm is less sensitive to link-based spam as it measures local popularity rather than using global popularity like PageRank).

Hopefully this marks a change in pace for Ask.com’s support of new standards in Search.

UPDATE – According to Microsoft Live Search, Ask.com were included in discussing the creation of this tag.

Tags: ,

0 comments Add This

The canonical tag – A fantastic new tool to combat duplicate content

The top three search engines have jointly announced a new meta tag to help combat the issue of duplicate content. We look through the potential uses of this new tag, show the places where it shouldn’t be used and illustrate where it’s a fantastic new addition to the SEO toolbox.

A common problem for search engines when indexing websites is that of duplicate content. Having multiple pages with identical or very similar content can create numerous problems for the search engines, such as wasting resources on unnecessary spidering and attempting to determine which version of a page is most relevant. Search engines are interested in finding and indexing unique content, not hundreds of identical pages!

If your site contains multiple pages of duplicate content with little or no variations, this can lead to a number of potential issues. When confronted with multiple pages of duplicate content a search engine will attempt to identify a canonical page and then display this within the search results. This can lead to an undesired page being identified and chosen as preferential, rather than the page you might prefer.

An additional issue is that if links are made to these different pages, the benefits of these links might potentially be split between different page variants, as search engines are not always able to identify duplicate pages. The inherent value that comes from anchor texts and link weight, through both internal and external links to the page, will be diluted by having multiple pages with duplicate content.

Many websites, especially e-commerce sites, have multiple ways to navigate to individual items of content. This in turn leads to multiplication of pages with little or no variation. For example, here are some common variations of a site’s homepage:

  • http://example.com
  • http://www.example.com
  • https://www.example.com
  • http://www.example.com/index.html
  • http://www.example.com/?referer=page.html

Each of these URLs would show a visitor an identical copy of the site’s homepage.

Generally these sorts of duplicate URLs are dealt with by using 301 redirects to send visitors to the correct version of a page. However, there are some instances where you might actually want to have multiple pages which are very similar, and therefore a 301 redirect is not suitable. For example, on many e-commerce sites, you can often sort lists of products in numerous ways. In these instances, the Robots Exclusion Protocol is usually called upon to block these duplicate pages from the search engines.

Yesterday Google, Yahoo! and Microsoft jointly announced a new HTML tag in an effort to help site designers and search engines more accurately define a website’s canonical pages. The tag provides search engines with a suggestion from the site’s owner that a specific page should be considered as the canonical version and therefore more authoritative than its’ duplicate brothers. This new tag is as follows:

<link rel="canonical" href="http://example.com" />

This is inserted within the <head> element of duplicate pages and enables search engine spiders to accurately identify which URL is the site owner’s preferred canonical page. This new tag transfers search ‘signals’ such as Google PageRank, to the appropriate preferred canonical URL rather than dispersing it across multiple URLs.

This new tag should be considered as a new tool in the SEO toolbox, and does not necessarily mean that other methods of reducing duplicate content are no longer useful. In general, it is still going to be better to use 301 redirects in the majority of cases. Here are several reasons why:

  • This tag only provides a hint to search engines – they will consider it as part of their algorithm, but it is still by no means certain that a search engine will pick your choice. This will vary between the search engines, leading to differing behaviour from different search engines.
  • It requires pages to be identical or almost-identical. Exactly how “identical” will be down to each search engine to decide, which again leads to different results in different search engines, and it won’t work at all if the pages are significantly different.
  • It doesn’t work in any other search engines (at least yet). This is more important internationally, where different search engines may have different market shares, and local players may have a significant presence.
  • Search engines have to crawl the duplicate URLs, leading to increased server load and potentially reduced coverage of the rest of your site
  • It doesn’t work across different domains (although it does work across different subdomains on the same domain).

There are also instances of duplicate content where it may not be the most appropriate tool to use, for example:

  • Anywhere you would usually use a 301 redirect (see above for why), such as dealing with the HTTPS version of a site, the non-www site version or index.html pages. A 301 redirect is definitely the best solution in these situations.
  • Accessible text-only versions of a site – contrary to popular belief these are actually recommended against by the W3C, and should not be used at all. It is quite possible to make your site accessible while retaining all the bells and whistles
  • Printer-friendly pages – a better solution is to create CSS style sheets for print media, which will make any page on your site printable.
  • Duplicate pages caused by use of session IDs or referrer tracking in URLs – These cause spidering problems and should not be used. Additionally, as URLs may be shared, these are not reliable.

However, this tool definitely introduces a number of new ways of dealing with certain difficult types of duplicate content issues. Here are some great uses for this new tag:

  • As mentioned earlier, many e-commerce sites allow you to sort lists of products in numerous ways. In these instances, the Robots Exclusion Protocol is usually used to block these pages. However, it is now possible to use this tag to point to the canonical version of a page, thus effectively merging these URLs in the eyes of the search engines, whilst leaving the site experience unaltered.
  • In the same way as sorting, this is potentially a suitable tool to use with pagination, although if the pages are not similar enough the search engines may potentially fail to follow this directive. We would generally recommend allowing the search engines to spider at least one paginated form, however.
  • One common method of split testing involves using URL parameters to differentiate pages (although this isn’t the only way of doing it). Using this new tag allows you perform split testing in this way without causing duplicate content issues. A side note – if you’re not doing split testing, you should be!
  • Another issue common to e-commerce sites is multiple methods of navigating to a particular product. For example, a product may be included in a general category and also be accessible in a category for the product brand. In some instances a 301 redirect may be possible, but sometimes the branding and navigation may need to be kept intact, and this tag is a suitable tool in this instance.
  • Where you want to handle people using invalid URL parameters. In fact, you could argue that every single page on a site should use this new link tag for this reason alone. With this tag in place, you no longer have to worry about links to your site being made with weird URL parameters – they’ll all be consolidated for you!

A
final, if somewhat niche way of using this tag is where you are showing multiple revisions of a document. A good example of this is a wiki, where documents can be edited and each revision of a page is stored for posterity. In fact, Wikia was the partner for the search engines to help them test this new tag out.

The exact impact of this new HTML tag is yet to be accurately measured and, in general, it will still be much more beneficial to use a traditional 301 redirect. However, for those instances where a 301 is not the appropriate solution, the rel=canonical link tag provides an invaluable new addition to the SEO toolbox.


Update: Ask.com are also going to support the canonical tag.

Additional research by John Trivett

Tags: , , , ,

0 comments Share