What is Canonical URL?
A canonical URL is an HTML element that tells search engines which version of a duplicate page is the preferred one to index.
What is a Canonical URL?
A canonical URL is an HTML signal, specified via <link rel="canonical" href="...">, that tells search engines which URL should be treated as the authoritative version when multiple URLs serve identical or very similar content. Introduced by Google, Yahoo, and Microsoft jointly in February 2009, the rel="canonical" element lives in the <head> of an HTML document. Search engines treat it as a strong hint — not a directive — meaning Google can choose to ignore it if other signals conflict. According to Google's own documentation, canonicalization consolidates link equity (PageRank) to the preferred URL, which directly impacts rankings. This matters most on e-commerce sites where faceted navigation, session IDs, and tracking parameters can generate thousands of duplicate URLs for a single product page.
How it works
When Googlebot crawls a page and finds a rel="canonical" tag, it registers the declared canonical as the preferred URL for indexing. Here's the basic implementation:
<head>
<link rel="canonical" href="https://example.com/shoes/red-sneakers" />
</head>
This tells search engines: "Even if you found this page at example.com/shoes/red-sneakers?utm_source=email&color=red, the real URL to index is example.com/shoes/red-sneakers."
There are three ways to set a canonical:
- HTML
<link>tag — most common, lives in<head>. - HTTP
Linkheader — useful for PDFs and non-HTML resources:Link: <https://example.com/doc.pdf>; rel="canonical". - Sitemap inclusion — Google treats URLs listed in your sitemap as suggested canonicals.
In Next.js (App Router, v14+), we typically set canonicals in generateMetadata():
export async function generateMetadata({ params }): Promise<Metadata> {
return {
alternates: {
canonical: `https://example.com/products/${params.slug}`,
},
};
}
In Astro, you'd drop it directly into your <head> in the layout component or use the astro-seo integration.
Critical rules: canonicals must be absolute URLs (not relative), must point to a 200-status page, and should be self-referencing on the preferred URL itself. We've shipped this on 50+ projects, and the most common mistake we fix is canonical tags pointing to URLs that 302-redirect somewhere else — which creates a conflicting signal loop.
When to use it
Canonical URLs solve a specific problem: content accessible at multiple URLs. Use them when:
- URL parameters create duplicates (sorting, filtering, tracking codes like
?utm_source=...) - HTTP vs HTTPS or www vs non-www versions both resolve
- Syndicated content appears on your site and a partner's site (cross-domain canonical)
- Paginated content has a preferred "view all" page
- Mobile/desktop URLs differ (though responsive design has made this less common)
Don't use canonical when:
- Content is genuinely different — canonicalizing two distinct pages tells Google to ignore one of them
- You mean to redirect — if the duplicate URL should never be visited by users, a 301 redirect is stronger and more correct
- You're trying to fix crawl budget issues — canonicals don't prevent crawling, only indexing. Use
robots.txtornoindexfor crawl/index control
Canonical URL vs alternatives
| Signal | Type | Strength | Prevents Crawling? | Best For |
|---|---|---|---|---|
rel="canonical" |
Hint | Strong hint | No | Duplicate URLs you still want accessible |
| 301 Redirect | Directive | Strongest | Yes (redirects) | Permanently moved/retired URLs |
noindex meta tag |
Directive | Strong | No | Pages you never want indexed |
robots.txt Disallow |
Directive | Blocks crawling | Yes | Entire sections you want hidden from bots |
| Sitemap inclusion | Hint | Weak hint | No | Supplementary canonical signal |
Our preferred approach: use 301 redirects when the duplicate URL has no business existing for users. Use canonicals when both URLs need to stay live (e.g., filtered e-commerce pages that users actually browse). Don't rely on noindex as a deduplication strategy — it drops the page from the index entirely rather than consolidating signals.
Real-world example
We worked on a Shopify-to-headless-Next.js migration for a retailer with ~12,000 product pages. Their old Shopify setup generated variant URLs like /products/widget?variant=123456 alongside /products/widget. Google Search Console showed 8,400 pages flagged as "Duplicate, Google chose different canonical than user." After the migration, we implemented self-referencing canonicals on every product page and pointed all variant URLs to the clean /products/[slug] canonical via generateMetadata(). Within six weeks, the "duplicate" count in Search Console dropped to under 200, and organic impressions for product pages increased 34% — purely from consolidating previously split ranking signals.