When To Index eCommerce “Canonical” Pages

Posted by Christopher

As we saw in my recent post about what counts as duplicate content, eCommerce sites fall prey to a lot of duplicate content risks that need to be mitigated. One we explored was the challenge of multiple product variants living on unique URLs, saying the same thing with no difference but their colour.

But let’s take a step back up the ladder, away from the products, to the category level. Nice safe haven of a single URL, right?

Alas, no.

Faceted category pages, i.e. category pages that use filters and options to whittle down the content (so, almost all eCommerce category pages) can quickly create new URLs that can compete with each other for Google’s attention, leaving you with lower rankings all round and so leading to less traffic and revenue. These competing URLs are often called “canonical pages”, where we need to determine which URL or page is canon and make sure the imitators are treated accordingly.

What are Canonical Pages?

Canonical pages refer to any pages where this is identical or sorted content across multiple URLs. For example, let’s say your site is www.awesomehats.com and we’re looking at your page for top hats on www.awesomehats.com/formal/tophats.

Many CMS will automatically canonicalise every URL to itself, including the variants, so check your settings and your content

The page uses filters at the side to narrow the selection on screen by size, colour and delivery options. The user can also sort the collection by price or popularity. Finally, the page only displays 30 products at a time so there are additional pages a user must click through to see the full range.

Activating any of these options adjusts the URL to sort, filter or extend the original content being displayed, for example:

  • Main URL for the collection - www.awesomehats.com/formal/tophats
  • Filtered by colour - www.awesomehats.com/formal/tophats?colour=black,brown
  • Filtered by colour and size - www.awesomehats.com/formal/tophats?colour=black,brown&size=7
  • Filtered by colour and sorted by price - www.awesomehats.com/formal/tophats?colour=black,brown&sort=lowhigh
  • Seeing the next 30 top hats - www.awesomehats.com/formal/tophats?page=2
  • Seeing the next 30 black top hats - www.awesomehats.com/formal/tophats?colour=black&page=2

The SEO concern is that, without any intervention, these are all unique URLs that Google could rightly index and return in searches for “top hats”. Except we know that having multiple URLs vying for Google’s attention rarely means search dominance with every URL at the top – it means falling rankings as Google doesn’t know which page to favour. There’s also the risk that all these identical pages could be seen as duplicate content and, worse yet, as thin content if the final product selections with all filters applied is very small.

Just to make matters worse for poor old Awesome Hats, a user can also access the same top hat selection by a menu around suitable hat-wearing events, leading us to www.awesomehats.com/weddings/tophats.

Here’s where canonicalisation comes in.

What is Canonicalisation?

Essentially, it is the process of:

  1. Identifying which URL is the primary URL that can be considered as canon
  2. Establishing the relationships between that page and its variants
  3. Ensuring the canonical version is indexed while all other variants are not indexed

Once this is done, all authority and rankings will go to the canon while the variants will be rightly overlooked by Google.

Technically, this can be done in two ways. Firstly, using the rel=canonical tag on the non-canon pages to tell search spiders that the page is a variant of the canon. Use the same tag on canon/primary page to point to itself. Note that many CMS will automatically canonicalise every URL to itself, including the variants, so check your settings and your content.

In the following screenshots for the Men’s Boots page of the Schuh site, taken with 1-Click SEO Meta, we can see the long URL full of variables to indicate sorting and filtering has a simple canonical to the main category, which is also reflected in the source code.

1-Click SEO Meta screen shot
1 click SEO meta
Screenshot of page code snippet
Source code

The second way is to use URL parameters in Search Console to directly tell Google (though not other search spiders) how your URL variables are constructed, what they do and whether they should be indexed or not. In the image below, we are setting up a basic parameter to explain to Google that the page query affects pagination.

Google console screen shot
Telling Google that the page paginates

Note that this is a powerful feature that can have unexpected consequences so do treat it carefully or trust an agency that specialises in retail SEO that will have experience configuring these variables for the correct effect across eCommerce sites.

Two sheep
Photo credit: Photo by Jørgen Håland

How to Choose Your Canonical URLs

So, how do you decide which pages are the primary, when you should canonicalise one page to another and when you should allow one or more of the faceted URLs to be indexed?

As ever, the key thing to consider is the user experience. Is there enough specific interest in the content of your faceted page and do you have the right depth of product to address it?

Consider:

Does the page have a broad but defined range that satisfies a specific search intent?

If you’re Schuh with hundreds of shoes that could fulfil a search for “men’s black nike trainer 9”, that’s a good candidate for a page you want indexed, so it should not be canonicalised to another page but allowed to be indexed and hopefully rank.

However, the traffic for “men’s black nike trainer 9 sorted by price” is likely going to be non-existent so your URL variants should definitely be kept out of the index with one of the methods discussed above.

If there’s search interest but your ability to satisfy it is poor, such as having 3 products where competitors have 30, that’s another URL that should be referencing elsewhere as it won’t provide the right user experience to warrant the risks of having the duplicate content in the index.

Depending on the search volume for a query and your ability to fulfil that query from your stock you may even want to create a plain URL and landing page for that term. For example, for “black top hats” Awesome Hats could create www.awesomehats.com/formal/tophats/black to gather all those black hats together on a fully optimised page. They could then use rel=canonical tags to make that the primary page for www.awesomehats.com/formal/tophats?colour=black.

Is the page a copy of another page?

On our hat example, we saw two duplicate pages on non-faceted URLs: www.awesomehats.com/formal/tophats and www.awesomehats.com/weddings/tophats.

Do we want both pages to rank? Or should one be canonicalised?

As before, is there enough specific interest in “top hats for weddings” compared to other top hat searches? If so, it may be worth fully optimising each page individually, if your CMS will allow it, with unique metadata and onsite content.

If the interest isn’t there or you cannot configure the pages individually, one page should be made the primary page and the other should point to it. Look at which page is most authoritative, considering factors like backlinks or Moz’s PA score, and make that the primary to be indexed with the other variant managed accordingly.

eCommerce sites can be sprawling, duplicative beasts, but with common sense and attention to detail, you can protect your visibility while making the most of the opportunities.

Follow my contributions to the blog to find out more about marketing in a digital world, or sign up to the ThoughtShift Guest List, our monthly email, to keep up-to-date on all our latest guides, advice and blog posts.