Sitemap for 91K Pages: Next.js + Supabase

Q: Do I submit each individual sitemap file to Google Search Console?

No. Submit only the sitemap index URL (e.g., https://yourdomain.com/sitemap.xml). Google reads the index and automatically discovers and processes all referenced sub-sitemaps. Submitting individual files is unnecessary and clutters your GSC dashboard.

Q: How often should sitemaps be regenerated for large dynamic sites?

For most content-heavy sites, hourly regeneration via ISR (revalidate = 3600) is a good default. If you publish content very frequently, pair it with on-demand revalidation triggered by database webhooks. Don't regenerate on every request -- that defeats caching and increases Supabase load unnecessarily.

Q: Does the priority value in sitemaps actually affect Google rankings?

Google has publicly stated they largely ignore priority and changefreq values. However, Bing and other search engines do use them. The lastmod field is the most important sitemap signal -- make sure it reflects actual content changes from your database, not the current timestamp.

Q: How do I handle Supabase's 1,000 row limit for sitemap queries?

Use Supabase's .range(offset, offset + batchSize - 1) method to paginate in batches of 1,000. Loop until you've fetched all rows for the current sitemap chunk. For count-only queries (used in the sitemap index), use .select('*', { count: 'exact', head: true }) which returns just the count without transferring any row data.

Last month we hit 91,000 pages on Deluxe Astrology. Celebrity birth charts, blog posts, localized content across six languages -- the site had grown way past what a single sitemap file could handle. Google's sitemap protocol caps you at 50,000 URLs per file and 50MB uncompressed. We needed a sitemap index with chunked sub-sitemaps, all generated dynamically from Supabase, cached with ISR on Vercel, and submitted to Google Search Console as a single index URL.

This is the exact implementation we shipped. Not a theoretical walkthrough -- actual production code that handles 91K URLs today and will scale to 500K without changes.

Building a Dynamic Sitemap for 91,000 Pages with Next.js and Supabase

Understanding Sitemap Limits and Architecture

Here are the hard limits you need to know:

Constraint	Limit	Source
URLs per sitemap file	50,000	sitemaps.org protocol
File size per sitemap	50MB uncompressed	sitemaps.org protocol
Sitemaps per sitemap index	50,000	sitemaps.org protocol
Supabase `.range()` max per query	1,000 rows (default)	Supabase PostgREST config
Vercel serverless function timeout (Pro)	60 seconds	Vercel docs 2025
Vercel response body size limit	10MB	Vercel edge caching

For 91,000 URLs, you need at minimum two sitemap files. But we don't just dump everything into two 50K-URL buckets. We split by content type -- celebrities, blog posts, static pages, localized pages -- because each type has different changefreq, priority, and update patterns. This gives us better control and makes debugging in GSC way easier when something goes wrong.

The Sitemap Structure for Deluxe Astrology

Here's what the final sitemap architecture looks like:

/sitemap.xml                    → Sitemap Index (points to all sub-sitemaps)
  /sitemap-pages.xml            → Static pages (~30 URLs)
  /sitemap-blog-0.xml           → Blog posts chunk 0 (up to 50K)
  /sitemap-blog-1.xml           → Blog posts chunk 1 (overflow)
  /sitemap-celebrities-0.xml    → Celebrity pages chunk 0 (up to 50K)
  /sitemap-celebrities-1.xml    → Celebrity pages chunk 1 (overflow)
  /sitemap-locale-es.xml        → Spanish localized pages
  /sitemap-locale-fr.xml        → French localized pages
  /sitemap-locale-de.xml        → German localized pages
  /sitemap-locale-pt.xml        → Portuguese localized pages
  /sitemap-locale-ja.xml        → Japanese localized pages

Each sub-sitemap is a Next.js App Router route handler that queries Supabase at runtime, generates XML, and caches via ISR with [revalidate = 3600](https://nextjs.org/docs/app/building-your-application/data-fetching/incremental-static-regeneration) (hourly). The sitemap index itself is also a route handler.

Setting Up Supabase Queries with Offset Pagination

Here's the critical piece most tutorials get wrong: you can't just do supabase.from('celebrities').select('*') and expect 91,000 rows back. Supabase's PostgREST layer defaults to returning 1,000 rows max. You need to paginate.

We use range-based offset pagination in batches of 1,000:

// lib/supabase-sitemap.ts
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY! // Use service role for server-side
);

const BATCH_SIZE = 1000;

export interface SitemapEntry {
  slug: string;
  updated_at: string;
}

export async function fetchAllSlugs(
  table: string,
  selectColumns: string = 'slug, updated_at'
): Promise<SitemapEntry[]> {
  const allRows: SitemapEntry[] = [];
  let offset = 0;
  let hasMore = true;

  while (hasMore) {
    const { data, error } = await supabase
      .from(table)
      .select(selectColumns)
      .order('updated_at', { ascending: false })
      .range(offset, offset + BATCH_SIZE - 1);

    if (error) {
      console.error(`Sitemap fetch error for ${table}:`, error.message);
      break;
    }

    if (data && data.length > 0) {
      allRows.push(...data);
      offset += BATCH_SIZE;
      hasMore = data.length === BATCH_SIZE;
    } else {
      hasMore = false;
    }
  }

  return allRows;
}

export async function fetchSlugsChunked(
  table: string,
  chunkIndex: number,
  chunkSize: number = 50000
): Promise<{ entries: SitemapEntry[]; totalCount: number }> {
  // First get total count for the sitemap index
  const { count } = await supabase
    .from(table)
    .select('*', { count: 'exact', head: true });

  const totalCount = count || 0;
  const startOffset = chunkIndex * chunkSize;
  const entries: SitemapEntry[] = [];
  let offset = startOffset;
  const endOffset = Math.min(startOffset + chunkSize, totalCount);

  while (offset < endOffset) {
    const batchEnd = Math.min(offset + BATCH_SIZE - 1, endOffset - 1);
    const { data, error } = await supabase
      .from(table)
      .select('slug, updated_at')
      .order('updated_at', { ascending: false })
      .range(offset, batchEnd);

    if (error || !data || data.length === 0) break;
    entries.push(...data);
    offset += data.length;
  }

  return { entries, totalCount };
}

export function getChunkCount(totalCount: number, chunkSize: number = 50000): number {
  return Math.ceil(totalCount / chunkSize);
}

A few things to note here. We use the SUPABASE_SERVICE_ROLE_KEY -- not the anon key -- because these route handlers run server-side and we don't want RLS policies slowing down our sitemap queries. The fetchSlugsChunked function only fetches the specific chunk needed for a given sitemap file, not the entire dataset. That matters when you're running on Vercel's 60-second function timeout.

Building a Dynamic Sitemap for 91,000 Pages with Next.js and Supabase - architecture

Building the Sitemap Index Route

The sitemap index is the single URL you submit to Google. It references all your sub-sitemaps.

// app/sitemap.xml/route.ts
import { NextResponse } from 'next/server';
import { createClient } from '@supabase/supabase-js';

export const revalidate = 3600; // ISR: regenerate every hour

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

const CHUNK_SIZE = 50000;
const SITE_URL = 'https://deluxeastrology.com';
const LOCALES = ['es', 'fr', 'de', 'pt', 'ja'];

async function getTableCount(table: string): Promise<number> {
  const { count } = await supabase
    .from(table)
    .select('*', { count: 'exact', head: true });
  return count || 0;
}

export async function GET() {
  const blogCount = await getTableCount('blog_posts');
  const celebrityCount = await getTableCount('celebrities');

  const blogChunks = Math.ceil(blogCount / CHUNK_SIZE);
  const celebrityChunks = Math.ceil(celebrityCount / CHUNK_SIZE);

  const now = new Date().toISOString();

  let sitemaps = '';

  // Static pages sitemap
  sitemaps += `
    <sitemap>
      <loc>${SITE_URL}/sitemap-pages.xml</loc>
      <lastmod>${now}</lastmod>
    </sitemap>`;

  // Blog sitemaps
  for (let i = 0; i < blogChunks; i++) {
    sitemaps += `
    <sitemap>
      <loc>${SITE_URL}/sitemap-blog-${i}.xml</loc>
      <lastmod>${now}</lastmod>
    </sitemap>`;
  }

  // Celebrity sitemaps
  for (let i = 0; i < celebrityChunks; i++) {
    sitemaps += `
    <sitemap>
      <loc>${SITE_URL}/sitemap-celebrities-${i}.xml</loc>
      <lastmod>${now}</lastmod>
    </sitemap>`;
  }

  // Locale sitemaps
  for (const locale of LOCALES) {
    sitemaps += `
    <sitemap>
      <loc>${SITE_URL}/sitemap-locale-${locale}.xml</loc>
      <lastmod>${now}</lastmod>
    </sitemap>`;
  }

  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${sitemaps}
</sitemapindex>`;

  return new NextResponse(xml, {
    headers: {
      'Content-Type': 'application/xml',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
    },
  });
}

Notice we're only doing count queries here -- head: true means Supabase returns just the count without any row data. This makes the sitemap index generation nearly instant.

Building Individual Chunked Sitemaps

Here's the celebrity sitemap handler with full pagination:

// app/sitemap-celebrities-[chunk].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchSlugsChunked } from '@/lib/supabase-sitemap';

export const revalidate = 3600;

const SITE_URL = 'https://deluxeastrology.com';

export async function GET(
  request: Request,
  { params }: { params: Promise<{ chunk: string }> }
) {
  const { chunk } = await params;
  const chunkIndex = parseInt(chunk, 10);

  if (isNaN(chunkIndex) || chunkIndex < 0) {
    return new NextResponse('Invalid chunk index', { status: 400 });
  }

  const { entries } = await fetchSlugsChunked('celebrities', chunkIndex);

  const urls = entries
    .map(
      (entry) => `
  <url>
    <loc>${SITE_URL}/celebrities/${entry.slug}</loc>
    <lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.6</priority>
  </url>`
    )
    .join('');

  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;

  return new NextResponse(xml, {
    headers: {
      'Content-Type': 'application/xml',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
    },
  });
}

The blog sitemap follows the same pattern but with different priority and changefreq:

// app/sitemap-blog-[chunk].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchSlugsChunked } from '@/lib/supabase-sitemap';

export const revalidate = 3600;

const SITE_URL = 'https://deluxeastrology.com';

export async function GET(
  request: Request,
  { params }: { params: Promise<{ chunk: string }> }
) {
  const { chunk } = await params;
  const chunkIndex = parseInt(chunk, 10);

  const { entries } = await fetchSlugsChunked('blog_posts', chunkIndex);

  const urls = entries
    .map(
      (entry) => `
  <url>
    <loc>${SITE_URL}/blog/${entry.slug}</loc>
    <lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>`
    )
    .join('');

  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;

  return new NextResponse(xml, {
    headers: {
      'Content-Type': 'application/xml',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
    },
  });
}

You'll need to configure your Next.js routing to handle the dynamic segment. In App Router, the folder name uses brackets:

app/
  sitemap.xml/
    route.ts
  sitemap-pages.xml/
    route.ts
  sitemap-blog-[chunk].xml/
    route.ts
  sitemap-celebrities-[chunk].xml/
    route.ts
  sitemap-locale-[lang].xml/
    route.ts

If the bracket-in-folder-name approach gives you trouble with your file system or IDE (it sometimes does), use route rewrites in next.config.ts instead:

// next.config.ts
const nextConfig = {
  async rewrites() {
    return [
      {
        source: '/sitemap-blog-:chunk(\\d+).xml',
        destination: '/api/sitemap-blog/:chunk',
      },
      {
        source: '/sitemap-celebrities-:chunk(\\d+).xml',
        destination: '/api/sitemap-celebrities/:chunk',
      },
      {
        source: '/sitemap-locale-:lang.xml',
        destination: '/api/sitemap-locale/:lang',
      },
    ];
  },
};

export default nextConfig;

Static Pages Sitemap

For the static pages sitemap, we hardcode the URLs since they rarely change:

// app/sitemap-pages.xml/route.ts
import { NextResponse } from 'next/server';

export const revalidate = 86400; // Once per day is fine for static pages

const SITE_URL = 'https://deluxeastrology.com';

const staticPages = [
  { path: '/', priority: '1.0', changefreq: 'daily' },
  { path: '/about', priority: '0.7', changefreq: 'monthly' },
  { path: '/solutions/birth-chart', priority: '0.9', changefreq: 'weekly' },
  { path: '/solutions/compatibility', priority: '0.9', changefreq: 'weekly' },
  { path: '/solutions/transit-report', priority: '0.9', changefreq: 'weekly' },
  { path: '/blog', priority: '0.8', changefreq: 'daily' },
  { path: '/celebrities', priority: '0.8', changefreq: 'daily' },
  { path: '/contact', priority: '0.5', changefreq: 'yearly' },
  { path: '/pricing', priority: '0.7', changefreq: 'monthly' },
];

export async function GET() {
  const now = new Date().toISOString();

  const urls = staticPages
    .map(
      (page) => `
  <url>
    <loc>${SITE_URL}${page.path}</loc>
    <lastmod>${now}</lastmod>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
  </url>`
    )
    .join('');

  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;

  return new NextResponse(xml, {
    headers: {
      'Content-Type': 'application/xml',
      'Cache-Control': 'public, s-maxage=86400, stale-while-revalidate=3600',
    },
  });
}

Localized Sitemaps with Hreflang

This is where it gets interesting. For multilingual content, you need xhtml:link elements with hreflang attributes. Each localized sitemap references all alternate language versions of each page:

// app/sitemap-locale-[lang].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchAllSlugs } from '@/lib/supabase-sitemap';

export const revalidate = 3600;

const SITE_URL = 'https://deluxeastrology.com';
const ALL_LOCALES = ['en', 'es', 'fr', 'de', 'pt', 'ja'];

export async function GET(
  request: Request,
  { params }: { params: Promise<{ lang: string }> }
) {
  const { lang } = await params;

  if (!ALL_LOCALES.includes(lang)) {
    return new NextResponse('Invalid locale', { status: 404 });
  }

  const entries = await fetchAllSlugs('localized_pages');
  // Filter to pages that have this locale
  const localeEntries = entries.filter((e: any) => e.locale === lang);

  const urls = localeEntries
    .map((entry: any) => {
      const alternates = ALL_LOCALES.map(
        (loc) =>
          `    <xhtml:link rel="alternate" hreflang="${loc}" href="${SITE_URL}/${loc}/${entry.slug}" />`
      ).join('\n');

      return `
  <url>
    <loc>${SITE_URL}/${lang}/${entry.slug}</loc>
    <lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.6</priority>
${alternates}
  </url>`;
    })
    .join('');

  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">${urls}
</urlset>`;

  return new NextResponse(xml, {
    headers: {
      'Content-Type': 'application/xml',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
    },
  });
}

ISR Revalidation Strategy

We set revalidate = 3600 on all sitemap routes. That means Vercel serves the cached XML for up to an hour, then regenerates it in the background on the next request. For 91K pages, this is the sweet spot -- frequent enough that new content shows up same-day, but not so aggressive that we're hammering Supabase.

For on-demand revalidation when content is published, add a revalidation endpoint:

// app/api/revalidate-sitemap/route.ts
import { revalidatePath } from 'next/cache';
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { secret, paths } = await request.json();

  if (secret !== process.env.REVALIDATION_SECRET) {
    return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
  }

  // Revalidate specific sitemap paths
  const targetPaths = paths || ['/sitemap.xml'];
  for (const path of targetPaths) {
    revalidatePath(path);
  }

  return NextResponse.json({ revalidated: true, paths: targetPaths });
}

Then set up a Supabase Database Webhook (or a Postgres trigger via pg_net) to call this endpoint whenever your celebrities or blog_posts tables are updated.

Priority and Change Frequency by Content Type

Here's the priority matrix we use. Google has said they mostly ignore priority and changefreq, but other crawlers (Bing, Yandex) still use them, and they don't hurt:

Content Type	Priority	Change Frequency	Rationale
Homepage	1.0	daily	Highest importance, frequently updated
Solutions/Features	0.9	weekly	Core product pages
Blog listing	0.8	daily	New posts regularly
Blog posts	0.8	weekly	Content updated occasionally
Celebrity pages	0.6	monthly	Rarely changes after creation
Localized pages	0.6	monthly	Translation updates are infrequent
Contact/Legal	0.5	yearly	Almost never changes

The lastmod value is critical and should always come from your database's updated_at column -- never hardcode it to new Date(). Google uses lastmod to prioritize recrawling, and if every page says it was modified right now, Google will eventually ignore your lastmod entirely.

Google Search Console Submission

Here's the straightforward part. In GSC:

Go to Sitemaps in the left sidebar
Enter https://yourdomain.com/sitemap.xml (the index URL only)
Click Submit

That's it. Don't submit individual sub-sitemaps. Google reads the index and discovers all the children automatically. You should see status "Success" within a few hours, and indexed URL counts will climb over the next 2-4 weeks.

For 91K URLs, expect Google to index 70-90% within the first month. The remaining pages typically have thin content, duplicate content issues, or are simply low-priority in Google's crawl budget allocation.

Also add your sitemap to robots.txt:

# robots.txt
User-agent: *
Allow: /

Sitemap: https://deluxeastrology.com/sitemap.xml

Debugging When Google Won't Index Your Pages

This is where most people get stuck. You've submitted 91K URLs but GSC shows only 40K indexed. Here's the systematic debugging checklist we follow:

Check for Accidental Noindex Tags

This is the #1 cause. Run a spot check:

curl -s https://deluxeastrology.com/celebrities/some-slug | grep -i 'noindex'

Also check your Next.js layout or page metadata. A common mistake is setting noindex in a layout that applies to thousands of pages:

// BAD: This noindexes all pages using this layout
export const metadata = {
  robots: { index: false, follow: true },
};

Verify robots.txt Isn't Blocking Crawling

Check https://yourdomain.com/robots.txt in a browser. Make sure you're not accidentally blocking your dynamic routes. On Vercel, also check for any middleware that might be returning 403s to Googlebot.

Inspect Crawl Errors in GSC

Go to Pages → Why pages aren't indexed. Common issues:

"Crawled - currently not indexed": Google saw the page but decided not to index it. Usually thin content.
"Discovered - currently not indexed": Google knows the URL exists but hasn't crawled it yet. Crawl budget issue.
"Excluded by noindex tag": Self-explanatory. Fix the tag.
"Duplicate without canonical": Add proper canonical tags.

Fix Orphan Pages with Internal Linking

This is huge for large sites. If your celebrity pages are only discoverable through the sitemap and have zero internal links pointing to them, Google will deprioritize crawling them. Add:

Category/listing pages that link to groups of celebrity pages
Related celebrity links on each celebrity page
"Trending" or "Recently Updated" sections on high-traffic pages
Breadcrumb navigation with structured data

Validate Individual URLs

Use GSC's URL Inspection tool on specific pages that aren't indexed. It shows you exactly what Google sees -- the rendered HTML, any errors, mobile usability issues, and the indexing status.

Check Sitemap Response Headers

Make sure your sitemap routes return proper headers:

curl -I https://deluxeastrology.com/sitemap.xml

You should see Content-Type: application/xml and a 200 status. If you're getting 304 Not Modified responses from stale caches, that can cause Google to skip re-reading your sitemap.

Performance and Cost Benchmarks

Here are real numbers from our production deployment as of early 2025:

Metric	Value
Total URLs in sitemap	91,247
Sitemap index generation time	~120ms (count queries only)
Individual sitemap generation (50K URLs)	~4.2 seconds
Supabase query cost per sitemap regeneration	~$0.01
Total sitemap XML size (all files combined)	~8.4MB uncompressed
Vercel bandwidth for sitemaps per month	~2.1GB (mostly Googlebot)
Vercel Pro plan cost	$20/user/month
Supabase Pro plan cost	$25/month
GSC indexing rate after 30 days	84% of submitted URLs
Time from content publish to sitemap update	≤1 hour (ISR) or ~5 seconds (on-demand)

The big takeaway: this whole setup costs basically nothing to run. Sitemap generation is a rounding error on your Vercel and Supabase bills.

If you're building a similar large-scale project and want help with the architecture, we've done this across multiple client sites. Check out our Next.js development capabilities or our headless CMS development work. For Astro-based sites with similar scale requirements, we've built comparable solutions using Astro's endpoint approach.

The full working code is available as a GitHub gist: all the route handlers, the Supabase query library, and the next.config.ts rewrites. If your project needs something more custom -- multi-tenant sitemaps, real-time revalidation, or sitemaps for 1M+ pages -- reach out to us and we'll scope it out.

FAQ

How many URLs can a single sitemap file contain?

The sitemap protocol allows a maximum of 50,000 URLs per file and 50MB uncompressed file size. For sites with more than 50K pages, you need a sitemap index that references multiple chunked sitemap files. In practice, most sitemap generators chunk at 45,000-50,000 URLs to leave a safety margin.

Should I use next-sitemap or build custom route handlers?

next-sitemap (v4+) is great for simpler setups and handles auto-chunking well. But for 91K+ dynamic pages with content-type-specific priorities, localized sitemaps with hreflang, and fine-grained ISR control, custom route handlers give you more control. We went custom because we needed different revalidation intervals per content type and wanted the sitemap structure to match our GSC debugging workflow.

Do I submit each individual sitemap file to Google Search Console?

No. Submit only the sitemap index URL (e.g., https://yourdomain.com/sitemap.xml). Google reads the index and automatically discovers and processes all referenced sub-sitemaps. Submitting individual files is unnecessary and clutters your GSC dashboard.

How often should sitemaps be regenerated for large dynamic sites?

For most content-heavy sites, hourly regeneration via ISR (revalidate = 3600) is a good default. If you publish content very frequently, pair it with on-demand revalidation triggered by database webhooks. Don't regenerate on every request -- that defeats caching and increases Supabase load unnecessarily.

Why isn't Google indexing all my sitemap URLs?

The most common causes are: accidental noindex meta tags, robots.txt blocking, thin/duplicate content, orphan pages with no internal links, and crawl budget limitations. Check GSC's "Pages" report under "Why pages aren't indexed" for specific reasons. For large sites, focus on improving internal linking to orphan pages -- this is often the single biggest lever.

Does the priority value in sitemaps actually affect Google rankings? Google has publicly stated they largely ignore priority and changefreq values. However, Bing and other search engines do use them. The lastmod field is the most important sitemap signal -- make sure it reflects actual content changes from your database, not the current timestamp.

How do I handle Supabase's 1,000 row limit for sitemap queries?

Use Supabase's .range(offset, offset + batchSize - 1) method to paginate in batches of 1,000. Loop until you've fetched all rows for the current sitemap chunk. For count-only queries (used in the sitemap index), use .select('*', { count: 'exact', head: true }) which returns just the count without transferring any row data.

Can this approach handle 500K or 1 million pages?

Yes, with minor adjustments. The chunked architecture scales linearly -- 1 million pages would produce about 20 sub-sitemaps. The main concern becomes Vercel's 60-second function timeout for generating individual 50K-URL sitemaps. If you hit that limit, reduce the chunk size to 25,000 or 10,000 URLs per file. The sitemap protocol allows up to 50,000 sitemaps in a single index, so you won't run into index-level limits anytime soon.

Key takeaway:

Split large sitemaps by content type -- each file stays under 50K URLs.