What is XML Sitemap?
An XML sitemap is a structured file that lists a website's URLs to help search engines discover and crawl pages efficiently.
What is an XML Sitemap?
An XML sitemap is a machine-readable file, typically hosted at /sitemap.xml, that enumerates the URLs on a website along with optional metadata like last-modified dates, change frequency, and priority. Defined by the Sitemaps protocol 0.9 (introduced in 2005 and jointly adopted by Google, Yahoo, and Microsoft in 2006), it gives search engine crawlers a structured manifest of pages you want indexed. A single sitemap file can contain up to 50,000 URLs and must not exceed 50 MB uncompressed. For larger sites, you use a sitemap index file that references multiple child sitemaps. XML sitemaps don't guarantee indexation — they're a discovery hint, not a directive. We include them on every project we ship because even well-linked sites benefit from explicit URL declarations, especially after launches, migrations, or when pages lack strong internal linking.
How it works
A sitemap is an XML document conforming to the Sitemaps protocol schema. At minimum, it contains a <urlset> root element with one or more <url> entries, each holding a <loc> element with the full canonical URL.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/blog/my-post</loc>
<lastmod>2026-03-28</lastmod>
</url>
</urlset>
Optional tags:
<lastmod>— ISO 8601 date. Google has said this is the most useful optional tag; they largely ignore<changefreq>and<priority>as of 2023.<changefreq>— Hint about how often content changes. Most crawlers disregard it.<priority>— Relative importance within your own site (0.0 to 1.0). Also largely ignored by Google.
You declare your sitemap's location in two ways: a Sitemap: directive in robots.txt, or by submitting it directly in Google Search Console or Bing Webmaster Tools. Both methods work; we do both.
In Next.js (App Router, v14+), you can generate sitemaps dynamically with a sitemap.ts file in your app/ directory. Astro has a first-party @astrojs/sitemap integration that auto-generates the file at build time. We've shipped both approaches on 50+ projects.
When to use it
Every production site should have a sitemap. Full stop. But some scenarios make them especially critical:
Use a sitemap when:
- Your site has 500+ pages and crawl budget matters
- You've just launched or migrated and pages have few inbound links
- You publish content frequently (blogs, e-commerce catalogs)
- Your site relies on JavaScript rendering and you want to ensure URL discovery
- You have orphan pages with no internal links pointing to them
You can deprioritize sitemaps when:
- Your site has fewer than ~10 pages with strong internal linking
- You're running a single-page application with no SEO requirements
Even in those cases, the cost of adding one is near zero, so there's little reason to skip it.
XML Sitemap vs alternatives
| Feature | XML Sitemap | RSS/Atom Feed | HTML Sitemap | robots.txt |
|---|---|---|---|---|
| Primary audience | Crawlers | Crawlers + humans | Humans | Crawlers |
| URL discovery | Yes | Yes (recent content) | Yes (via links) | No (only sitemap pointer) |
| Metadata (lastmod) | Yes | Yes (pubDate) | No | No |
| Max URLs | 50,000 per file | Typically recent items only | Practical limit ~few hundred | N/A |
| Google recommendation | Strongly recommended | Recommended for blogs/news | Optional | Required for sitemap declaration |
Google has explicitly recommended using RSS/Atom feeds alongside XML sitemaps for frequently updated content (per their 2023 documentation updates). The two are complementary, not competing. An HTML sitemap is a user-facing page — nice for UX but not a substitute for the XML version. robots.txt doesn't list URLs; it just points crawlers to where your sitemap lives.
Real-world example
We migrated a 12,000-page e-commerce catalog from a legacy platform to Next.js 14. During the migration, many product pages temporarily lacked internal links because category page rebuilds lagged behind. We generated a dynamic sitemap index with 3 child sitemaps (products, categories, blog) using sitemap.ts in the App Router, each with accurate <lastmod> dates pulled from the CMS. Within 48 hours of submitting the sitemap to Google Search Console, crawl stats showed Google had discovered 94% of the new URLs. Without the sitemap, organic traffic recovery would have taken weeks longer. We also added a Sitemap: https://store.example.com/sitemap.xml directive to robots.txt as a fallback discovery mechanism.