In a recent episode of Google’s Search Off the Record podcast, Allan Scott from the “Dups” team explained how Google decides which URL to consider as the main one when there are duplicate pages.
He revealed that Google looks at about 40 different signals to pick the main URL from a group of similar pages.
Duplicate content is a common problem for search engines because many websites have multiple pages with the same or similar content.
To solve this, Google uses a process called canonicalization. This process allows Google to pick one URL as the main version to index and show in search results.
Google has discussed the importance of using signals like rel=”canonical” tags, sitemaps, and 301 redirects for canonicalization. However, the number of signals involved in this process is more than you may expect.
Scott revealed during the podcast:
“I’m not sure what the exact number is right now because it goes up and down, but I suspect it’s somewhere in the neighborhood of 40.”
Some of the known signals mentioned include:
The weight and importance of each signal may vary, and some signals, like rel=”canonical” tags, can influence both the clustering and canonicalization process.
With so many signals at play, Allan acknowledged the challenges in determining the canonical URL when signals conflict.
He stated:
“If your signals conflict with each other, what’s going to happen is the system will start falling back on lesser signals.”
This means that while strong signals like rel=”canonical” tags and 301 redirects are crucial, other factors can come into play when these signals are unclear or contradictory.
As a result, Google’s canonicalization process involves a delicate balancing act to determine the most appropriate canonical URL.
Clear signals help Google identify the preferred canonical URL.
Best practices include:
These signals help Google find the correct canonical URLs, improving your site’s crawling, indexing, and search visibility.
Here are a few common mistakes to watch out for.
Fix: Double-check canonical tags, use only one per page, and use absolute URLs.
When Page A points to Page B as canonical, but Page B points back to A or another page, creating a loop.
Fix: Ensure canonical URLs always point to the final, preferred version of the page.
Sending mixed signals to search engines. Noindex means don’t index the page at all, making canonicals irrelevant.
Fix: Use canonical tags for consolidation and noindex for exclusion.
Pointing canonicals to redirected or noindex pages confuses search engines.
Fix: Canonical URLs should be 200 status and indexable.
Inconsistent URL casing can cause duplicate content issues.
Fix: Keep URL and canonical tag casing consistent.
Paginated content and parameter-heavy URLs can cause duplication if mishandled.
Fix: Use canonical tags pointing to the first page or “View All” for pagination, and keep parameters consistent.
It’s unlikely the complete list of 40+ signals used to determine canonical URLs will be made publicly available.
However, this was still an insightful discussion worth highlighting.
Here are the key takeaways:
Hear the full discussion in the video below:
Featured Image: chatiyanon/Shutterstock