What is an XML Sitemap?

An XML sitemap is a machine-readable file, typically hosted at /sitemap.xml, that enumerates the URLs on a website along with optional metadata like last-modified dates, change frequency, and priority. Defined by the Sitemaps protocol 0.9 (introduced in 2005 and jointly adopted by Google, Yahoo, and Microsoft in 2006), it gives search engine crawlers a structured manifest of pages you want indexed. A single sitemap file can contain up to 50,000 URLs and must not exceed 50 MB uncompressed. For larger sites, you use a sitemap index file that references multiple child sitemaps. XML sitemaps don't guarantee indexation — they're a discovery hint, not a directive. We include them on every project we ship because even well-linked sites benefit from explicit URL declarations, especially after launches, migrations, or when pages lack strong internal linking.

How it works

A sitemap is an XML document conforming to the Sitemaps protocol schema. At minimum, it contains a <urlset> root element with one or more <url> entries, each holding a <loc> element with the full canonical URL.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-04-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/blog/my-post</loc>
    <lastmod>2026-03-28</lastmod>
  </url>
</urlset>

Optional tags:

<lastmod> — ISO 8601 date. Google has said this is the most useful optional tag; they largely ignore <changefreq> and <priority> as of 2023.
<changefreq> — Hint about how often content changes. Most crawlers disregard it.
<priority> — Relative importance within your own site (0.0 to 1.0). Also largely ignored by Google.

You declare your sitemap's location in two ways: a Sitemap: directive in robots.txt, or by submitting it directly in Google Search Console or Bing Webmaster Tools. Both methods work; we do both.

In Next.js (App Router, v14+), you can generate sitemaps dynamically with a sitemap.ts file in your app/ directory. Astro has a first-party @astrojs/sitemap integration that auto-generates the file at build time. We've shipped both approaches on 50+ projects.

When to use it

Every production site should have a sitemap. Full stop. But some scenarios make them especially critical:

Use a sitemap when:

Your site has 500+ pages and crawl budget matters
You've just launched or migrated and pages have few inbound links
You publish content frequently (blogs, e-commerce catalogs)
Your site relies on JavaScript rendering and you want to ensure URL discovery
You have orphan pages with no internal links pointing to them

You can deprioritize sitemaps when:

Your site has fewer than ~10 pages with strong internal linking
You're running a single-page application with no SEO requirements

Even in those cases, the cost of adding one is near zero, so there's little reason to skip it.

XML Sitemap vs alternatives

Feature	XML Sitemap	RSS/Atom Feed	HTML Sitemap	robots.txt
Primary audience	Crawlers	Crawlers + humans	Humans	Crawlers
URL discovery	Yes	Yes (recent content)	Yes (via links)	No (only sitemap pointer)
Metadata (lastmod)	Yes	Yes (pubDate)	No	No
Max URLs	50,000 per file	Typically recent items only	Practical limit ~few hundred	N/A
Google recommendation	Strongly recommended	Recommended for blogs/news	Optional	Required for sitemap declaration

Google has explicitly recommended using RSS/Atom feeds alongside XML sitemaps for frequently updated content (per their 2023 documentation updates). The two are complementary, not competing. An HTML sitemap is a user-facing page — nice for UX but not a substitute for the XML version. robots.txt doesn't list URLs; it just points crawlers to where your sitemap lives.

Real-world example

We migrated a 12,000-page e-commerce catalog from a legacy platform to Next.js 14. During the migration, many product pages temporarily lacked internal links because category page rebuilds lagged behind. We generated a dynamic sitemap index with 3 child sitemaps (products, categories, blog) using sitemap.ts in the App Router, each with accurate <lastmod> dates pulled from the CMS. Within 48 hours of submitting the sitemap to Google Search Console, crawl stats showed Google had discovered 94% of the new URLs. Without the sitemap, organic traffic recovery would have taken weeks longer. We also added a Sitemap: https://store.example.com/sitemap.xml directive to robots.txt as a fallback discovery mechanism.

Frequently asked questions about XML Sitemap

Is an XML sitemap the same as robots.txt?

No. They serve different purposes. An XML sitemap is a list of URLs you want search engines to find. `robots.txt` is a set of crawl directives that tell bots which paths they're allowed or disallowed from accessing. They work together: you typically include a `Sitemap:` directive inside `robots.txt` so crawlers can find your sitemap automatically. Think of `robots.txt` as the bouncer and the sitemap as the guest list — related, but distinct roles.

When did XML sitemaps become a standard?

Google introduced the Sitemaps protocol 0.9 in June 2005. It became a cross-engine standard in November 2006 when Google, Yahoo, and Microsoft jointly announced support at sitemaps.org. The protocol hasn't had a major version update since — it's still 0.9 — but search engines have evolved how they use the data. Notably, Google confirmed in 2023 that they effectively ignore `<changefreq>` and `<priority>` and primarily rely on `<loc>` and `<lastmod>`.

What's the alternative to an XML sitemap?

RSS and Atom feeds are the closest alternative for URL discovery, and Google recommends them for frequently updated content like blogs and news sites. An HTML sitemap (a regular webpage listing your URLs) can help users and crawlers find pages through links, but it doesn't carry metadata like last-modified dates. In practice, we use XML sitemaps and RSS feeds together — the sitemap covers the full site inventory while the feed highlights recent changes. There's no reason to pick one over the other.

Does submitting a sitemap guarantee Google will index my pages?

No. A sitemap is a discovery hint, not an indexing directive. Google still evaluates each URL for quality, relevance, and crawl budget before deciding whether to index it. If a page has thin content, is blocked by `robots.txt`, or has a `noindex` meta tag, the sitemap won't override that. What a sitemap does is ensure Google knows the URL exists — which is especially valuable for new, orphaned, or deeply nested pages that crawlers might otherwise miss. Submitting a sitemap speeds up discovery, not indexation decisions.

What is XML Sitemap?