Building a Dynamic Sitemap for 91,000 Pages with Next.js and Supabase
Last month we hit 91,000 pages on Deluxe Astrology. Celebrity birth charts, blog posts, localized content across six languages -- the site had grown way past what a single sitemap file could handle. Google's sitemap protocol caps you at 50,000 URLs per file and 50MB uncompressed. We needed a sitemap index with chunked sub-sitemaps, all generated dynamically from Supabase, cached with ISR on Vercel, and submitted to Google Search Console as a single index URL.
This is the exact implementation we shipped. Not a theoretical walkthrough -- actual production code that handles 91K URLs today and will scale to 500K without changes.

Understanding Sitemap Limits and Architecture
Here are the hard limits you need to know:
| Constraint | Limit | Source |
|---|---|---|
| URLs per sitemap file | 50,000 | sitemaps.org protocol |
| File size per sitemap | 50MB uncompressed | sitemaps.org protocol |
| Sitemaps per sitemap index | 50,000 | sitemaps.org protocol |
Supabase .range() max per query |
1,000 rows (default) | Supabase PostgREST config |
| Vercel serverless function timeout (Pro) | 60 seconds | Vercel docs 2025 |
| Vercel response body size limit | 10MB | Vercel edge caching |
For 91,000 URLs, you need at minimum two sitemap files. But we don't just dump everything into two 50K-URL buckets. We split by content type -- celebrities, blog posts, static pages, localized pages -- because each type has different changefreq, priority, and update patterns. This gives us better control and makes debugging in GSC way easier when something goes wrong.
The Sitemap Structure for Deluxe Astrology
Here's what the final sitemap architecture looks like:
/sitemap.xml → Sitemap Index (points to all sub-sitemaps)
/sitemap-pages.xml → Static pages (~30 URLs)
/sitemap-blog-0.xml → Blog posts chunk 0 (up to 50K)
/sitemap-blog-1.xml → Blog posts chunk 1 (overflow)
/sitemap-celebrities-0.xml → Celebrity pages chunk 0 (up to 50K)
/sitemap-celebrities-1.xml → Celebrity pages chunk 1 (overflow)
/sitemap-locale-es.xml → Spanish localized pages
/sitemap-locale-fr.xml → French localized pages
/sitemap-locale-de.xml → German localized pages
/sitemap-locale-pt.xml → Portuguese localized pages
/sitemap-locale-ja.xml → Japanese localized pages
Each sub-sitemap is a Next.js App Router route handler that queries Supabase at runtime, generates XML, and caches via ISR with revalidate = 3600 (hourly). The sitemap index itself is also a route handler.
Setting Up Supabase Queries with Offset Pagination
Here's the critical piece most tutorials get wrong: you can't just do supabase.from('celebrities').select('*') and expect 91,000 rows back. Supabase's PostgREST layer defaults to returning 1,000 rows max. You need to paginate.
We use range-based offset pagination in batches of 1,000:
// lib/supabase-sitemap.ts
import { createClient } from '@supabase/supabase-js';
const supabase = createClient(
process.env.NEXT_PUBLIC_SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY! // Use service role for server-side
);
const BATCH_SIZE = 1000;
export interface SitemapEntry {
slug: string;
updated_at: string;
}
export async function fetchAllSlugs(
table: string,
selectColumns: string = 'slug, updated_at'
): Promise<SitemapEntry[]> {
const allRows: SitemapEntry[] = [];
let offset = 0;
let hasMore = true;
while (hasMore) {
const { data, error } = await supabase
.from(table)
.select(selectColumns)
.order('updated_at', { ascending: false })
.range(offset, offset + BATCH_SIZE - 1);
if (error) {
console.error(`Sitemap fetch error for ${table}:`, error.message);
break;
}
if (data && data.length > 0) {
allRows.push(...data);
offset += BATCH_SIZE;
hasMore = data.length === BATCH_SIZE;
} else {
hasMore = false;
}
}
return allRows;
}
export async function fetchSlugsChunked(
table: string,
chunkIndex: number,
chunkSize: number = 50000
): Promise<{ entries: SitemapEntry[]; totalCount: number }> {
// First get total count for the sitemap index
const { count } = await supabase
.from(table)
.select('*', { count: 'exact', head: true });
const totalCount = count || 0;
const startOffset = chunkIndex * chunkSize;
const entries: SitemapEntry[] = [];
let offset = startOffset;
const endOffset = Math.min(startOffset + chunkSize, totalCount);
while (offset < endOffset) {
const batchEnd = Math.min(offset + BATCH_SIZE - 1, endOffset - 1);
const { data, error } = await supabase
.from(table)
.select('slug, updated_at')
.order('updated_at', { ascending: false })
.range(offset, batchEnd);
if (error || !data || data.length === 0) break;
entries.push(...data);
offset += data.length;
}
return { entries, totalCount };
}
export function getChunkCount(totalCount: number, chunkSize: number = 50000): number {
return Math.ceil(totalCount / chunkSize);
}
A few things to note here. We use the SUPABASE_SERVICE_ROLE_KEY -- not the anon key -- because these route handlers run server-side and we don't want RLS policies slowing down our sitemap queries. The fetchSlugsChunked function only fetches the specific chunk needed for a given sitemap file, not the entire dataset. That matters when you're running on Vercel's 60-second function timeout.

Building the Sitemap Index Route
The sitemap index is the single URL you submit to Google. It references all your sub-sitemaps.
// app/sitemap.xml/route.ts
import { NextResponse } from 'next/server';
import { createClient } from '@supabase/supabase-js';
export const revalidate = 3600; // ISR: regenerate every hour
const supabase = createClient(
process.env.NEXT_PUBLIC_SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
);
const CHUNK_SIZE = 50000;
const SITE_URL = 'https://deluxeastrology.com';
const LOCALES = ['es', 'fr', 'de', 'pt', 'ja'];
async function getTableCount(table: string): Promise<number> {
const { count } = await supabase
.from(table)
.select('*', { count: 'exact', head: true });
return count || 0;
}
export async function GET() {
const blogCount = await getTableCount('blog_posts');
const celebrityCount = await getTableCount('celebrities');
const blogChunks = Math.ceil(blogCount / CHUNK_SIZE);
const celebrityChunks = Math.ceil(celebrityCount / CHUNK_SIZE);
const now = new Date().toISOString();
let sitemaps = '';
// Static pages sitemap
sitemaps += `
<sitemap>
<loc>${SITE_URL}/sitemap-pages.xml</loc>
<lastmod>${now}</lastmod>
</sitemap>`;
// Blog sitemaps
for (let i = 0; i < blogChunks; i++) {
sitemaps += `
<sitemap>
<loc>${SITE_URL}/sitemap-blog-${i}.xml</loc>
<lastmod>${now}</lastmod>
</sitemap>`;
}
// Celebrity sitemaps
for (let i = 0; i < celebrityChunks; i++) {
sitemaps += `
<sitemap>
<loc>${SITE_URL}/sitemap-celebrities-${i}.xml</loc>
<lastmod>${now}</lastmod>
</sitemap>`;
}
// Locale sitemaps
for (const locale of LOCALES) {
sitemaps += `
<sitemap>
<loc>${SITE_URL}/sitemap-locale-${locale}.xml</loc>
<lastmod>${now}</lastmod>
</sitemap>`;
}
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${sitemaps}
</sitemapindex>`;
return new NextResponse(xml, {
headers: {
'Content-Type': 'application/xml',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
},
});
}
Notice we're only doing count queries here -- head: true means Supabase returns just the count without any row data. This makes the sitemap index generation nearly instant.
Building Individual Chunked Sitemaps
Here's the celebrity sitemap handler with full pagination:
// app/sitemap-celebrities-[chunk].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchSlugsChunked } from '@/lib/supabase-sitemap';
export const revalidate = 3600;
const SITE_URL = 'https://deluxeastrology.com';
export async function GET(
request: Request,
{ params }: { params: Promise<{ chunk: string }> }
) {
const { chunk } = await params;
const chunkIndex = parseInt(chunk, 10);
if (isNaN(chunkIndex) || chunkIndex < 0) {
return new NextResponse('Invalid chunk index', { status: 400 });
}
const { entries } = await fetchSlugsChunked('celebrities', chunkIndex);
const urls = entries
.map(
(entry) => `
<url>
<loc>${SITE_URL}/celebrities/${entry.slug}</loc>
<lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>`
)
.join('');
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;
return new NextResponse(xml, {
headers: {
'Content-Type': 'application/xml',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
},
});
}
The blog sitemap follows the same pattern but with different priority and changefreq:
// app/sitemap-blog-[chunk].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchSlugsChunked } from '@/lib/supabase-sitemap';
export const revalidate = 3600;
const SITE_URL = 'https://deluxeastrology.com';
export async function GET(
request: Request,
{ params }: { params: Promise<{ chunk: string }> }
) {
const { chunk } = await params;
const chunkIndex = parseInt(chunk, 10);
const { entries } = await fetchSlugsChunked('blog_posts', chunkIndex);
const urls = entries
.map(
(entry) => `
<url>
<loc>${SITE_URL}/blog/${entry.slug}</loc>
<lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>`
)
.join('');
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;
return new NextResponse(xml, {
headers: {
'Content-Type': 'application/xml',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
},
});
}
You'll need to configure your Next.js routing to handle the dynamic segment. In App Router, the folder name uses brackets:
app/
sitemap.xml/
route.ts
sitemap-pages.xml/
route.ts
sitemap-blog-[chunk].xml/
route.ts
sitemap-celebrities-[chunk].xml/
route.ts
sitemap-locale-[lang].xml/
route.ts
If the bracket-in-folder-name approach gives you trouble with your file system or IDE (it sometimes does), use route rewrites in next.config.ts instead:
// next.config.ts
const nextConfig = {
async rewrites() {
return [
{
source: '/sitemap-blog-:chunk(\\d+).xml',
destination: '/api/sitemap-blog/:chunk',
},
{
source: '/sitemap-celebrities-:chunk(\\d+).xml',
destination: '/api/sitemap-celebrities/:chunk',
},
{
source: '/sitemap-locale-:lang.xml',
destination: '/api/sitemap-locale/:lang',
},
];
},
};
export default nextConfig;
Static Pages Sitemap
For the static pages sitemap, we hardcode the URLs since they rarely change:
// app/sitemap-pages.xml/route.ts
import { NextResponse } from 'next/server';
export const revalidate = 86400; // Once per day is fine for static pages
const SITE_URL = 'https://deluxeastrology.com';
const staticPages = [
{ path: '/', priority: '1.0', changefreq: 'daily' },
{ path: '/about', priority: '0.7', changefreq: 'monthly' },
{ path: '/solutions/birth-chart', priority: '0.9', changefreq: 'weekly' },
{ path: '/solutions/compatibility', priority: '0.9', changefreq: 'weekly' },
{ path: '/solutions/transit-report', priority: '0.9', changefreq: 'weekly' },
{ path: '/blog', priority: '0.8', changefreq: 'daily' },
{ path: '/celebrities', priority: '0.8', changefreq: 'daily' },
{ path: '/contact', priority: '0.5', changefreq: 'yearly' },
{ path: '/pricing', priority: '0.7', changefreq: 'monthly' },
];
export async function GET() {
const now = new Date().toISOString();
const urls = staticPages
.map(
(page) => `
<url>
<loc>${SITE_URL}${page.path}</loc>
<lastmod>${now}</lastmod>
<changefreq>${page.changefreq}</changefreq>
<priority>${page.priority}</priority>
</url>`
)
.join('');
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls}
</urlset>`;
return new NextResponse(xml, {
headers: {
'Content-Type': 'application/xml',
'Cache-Control': 'public, s-maxage=86400, stale-while-revalidate=3600',
},
});
}
Localized Sitemaps with Hreflang
This is where it gets interesting. For multilingual content, you need xhtml:link elements with hreflang attributes. Each localized sitemap references all alternate language versions of each page:
// app/sitemap-locale-[lang].xml/route.ts
import { NextResponse } from 'next/server';
import { fetchAllSlugs } from '@/lib/supabase-sitemap';
export const revalidate = 3600;
const SITE_URL = 'https://deluxeastrology.com';
const ALL_LOCALES = ['en', 'es', 'fr', 'de', 'pt', 'ja'];
export async function GET(
request: Request,
{ params }: { params: Promise<{ lang: string }> }
) {
const { lang } = await params;
if (!ALL_LOCALES.includes(lang)) {
return new NextResponse('Invalid locale', { status: 404 });
}
const entries = await fetchAllSlugs('localized_pages');
// Filter to pages that have this locale
const localeEntries = entries.filter((e: any) => e.locale === lang);
const urls = localeEntries
.map((entry: any) => {
const alternates = ALL_LOCALES.map(
(loc) =>
` <xhtml:link rel="alternate" hreflang="${loc}" href="${SITE_URL}/${loc}/${entry.slug}" />`
).join('\n');
return `
<url>
<loc>${SITE_URL}/${lang}/${entry.slug}</loc>
<lastmod>${new Date(entry.updated_at).toISOString()}</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
${alternates}
</url>`;
})
.join('');
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">${urls}
</urlset>`;
return new NextResponse(xml, {
headers: {
'Content-Type': 'application/xml',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=600',
},
});
}
ISR Revalidation Strategy
We set revalidate = 3600 on all sitemap routes. That means Vercel serves the cached XML for up to an hour, then regenerates it in the background on the next request. For 91K pages, this is the sweet spot -- frequent enough that new content shows up same-day, but not so aggressive that we're hammering Supabase.
For on-demand revalidation when content is published, add a revalidation endpoint:
// app/api/revalidate-sitemap/route.ts
import { revalidatePath } from 'next/cache';
import { NextResponse } from 'next/server';
export async function POST(request: Request) {
const { secret, paths } = await request.json();
if (secret !== process.env.REVALIDATION_SECRET) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
}
// Revalidate specific sitemap paths
const targetPaths = paths || ['/sitemap.xml'];
for (const path of targetPaths) {
revalidatePath(path);
}
return NextResponse.json({ revalidated: true, paths: targetPaths });
}
Then set up a Supabase Database Webhook (or a Postgres trigger via pg_net) to call this endpoint whenever your celebrities or blog_posts tables are updated.
Priority and Change Frequency by Content Type
Here's the priority matrix we use. Google has said they mostly ignore priority and changefreq, but other crawlers (Bing, Yandex) still use them, and they don't hurt:
| Content Type | Priority | Change Frequency | Rationale |
|---|---|---|---|
| Homepage | 1.0 | daily | Highest importance, frequently updated |
| Solutions/Features | 0.9 | weekly | Core product pages |
| Blog listing | 0.8 | daily | New posts regularly |
| Blog posts | 0.8 | weekly | Content updated occasionally |
| Celebrity pages | 0.6 | monthly | Rarely changes after creation |
| Localized pages | 0.6 | monthly | Translation updates are infrequent |
| Contact/Legal | 0.5 | yearly | Almost never changes |
The lastmod value is critical and should always come from your database's updated_at column -- never hardcode it to new Date(). Google uses lastmod to prioritize recrawling, and if every page says it was modified right now, Google will eventually ignore your lastmod entirely.
Google Search Console Submission
Here's the straightforward part. In GSC:
- Go to Sitemaps in the left sidebar
- Enter
https://yourdomain.com/sitemap.xml(the index URL only) - Click Submit
That's it. Don't submit individual sub-sitemaps. Google reads the index and discovers all the children automatically. You should see status "Success" within a few hours, and indexed URL counts will climb over the next 2-4 weeks.
For 91K URLs, expect Google to index 70-90% within the first month. The remaining pages typically have thin content, duplicate content issues, or are simply low-priority in Google's crawl budget allocation.
Also add your sitemap to robots.txt:
# robots.txt
User-agent: *
Allow: /
Sitemap: https://deluxeastrology.com/sitemap.xml
Debugging When Google Won't Index Your Pages
This is where most people get stuck. You've submitted 91K URLs but GSC shows only 40K indexed. Here's the systematic debugging checklist we follow:
Check for Accidental Noindex Tags
This is the #1 cause. Run a spot check:
curl -s https://deluxeastrology.com/celebrities/some-slug | grep -i 'noindex'
Also check your Next.js layout or page metadata. A common mistake is setting noindex in a layout that applies to thousands of pages:
// BAD: This noindexes all pages using this layout
export const metadata = {
robots: { index: false, follow: true },
};
Verify robots.txt Isn't Blocking Crawling
Check https://yourdomain.com/robots.txt in a browser. Make sure you're not accidentally blocking your dynamic routes. On Vercel, also check for any middleware that might be returning 403s to Googlebot.
Inspect Crawl Errors in GSC
Go to Pages → Why pages aren't indexed. Common issues:
- "Crawled - currently not indexed": Google saw the page but decided not to index it. Usually thin content.
- "Discovered - currently not indexed": Google knows the URL exists but hasn't crawled it yet. Crawl budget issue.
- "Excluded by noindex tag": Self-explanatory. Fix the tag.
- "Duplicate without canonical": Add proper canonical tags.
Fix Orphan Pages with Internal Linking
This is huge for large sites. If your celebrity pages are only discoverable through the sitemap and have zero internal links pointing to them, Google will deprioritize crawling them. Add:
- Category/listing pages that link to groups of celebrity pages
- Related celebrity links on each celebrity page
- "Trending" or "Recently Updated" sections on high-traffic pages
- Breadcrumb navigation with structured data
Validate Individual URLs
Use GSC's URL Inspection tool on specific pages that aren't indexed. It shows you exactly what Google sees -- the rendered HTML, any errors, mobile usability issues, and the indexing status.
Check Sitemap Response Headers
Make sure your sitemap routes return proper headers:
curl -I https://deluxeastrology.com/sitemap.xml
You should see Content-Type: application/xml and a 200 status. If you're getting 304 Not Modified responses from stale caches, that can cause Google to skip re-reading your sitemap.
Performance and Cost Benchmarks
Here are real numbers from our production deployment as of early 2025:
| Metric | Value |
|---|---|
| Total URLs in sitemap | 91,247 |
| Sitemap index generation time | ~120ms (count queries only) |
| Individual sitemap generation (50K URLs) | ~4.2 seconds |
| Supabase query cost per sitemap regeneration | ~$0.01 |
| Total sitemap XML size (all files combined) | ~8.4MB uncompressed |
| Vercel bandwidth for sitemaps per month | ~2.1GB (mostly Googlebot) |
| Vercel Pro plan cost | $20/user/month |
| Supabase Pro plan cost | $25/month |
| GSC indexing rate after 30 days | 84% of submitted URLs |
| Time from content publish to sitemap update | ≤1 hour (ISR) or ~5 seconds (on-demand) |
The big takeaway: this whole setup costs basically nothing to run. Sitemap generation is a rounding error on your Vercel and Supabase bills.
If you're building a similar large-scale project and want help with the architecture, we've done this across multiple client sites. Check out our Next.js development capabilities or our headless CMS development work. For Astro-based sites with similar scale requirements, we've built comparable solutions using Astro's endpoint approach.
The full working code is available as a GitHub gist: all the route handlers, the Supabase query library, and the next.config.ts rewrites. If your project needs something more custom -- multi-tenant sitemaps, real-time revalidation, or sitemaps for 1M+ pages -- reach out to us and we'll scope it out.
FAQ
How many URLs can a single sitemap file contain?
The sitemap protocol allows a maximum of 50,000 URLs per file and 50MB uncompressed file size. For sites with more than 50K pages, you need a sitemap index that references multiple chunked sitemap files. In practice, most sitemap generators chunk at 45,000-50,000 URLs to leave a safety margin.
Should I use next-sitemap or build custom route handlers?
next-sitemap (v4+) is great for simpler setups and handles auto-chunking well. But for 91K+ dynamic pages with content-type-specific priorities, localized sitemaps with hreflang, and fine-grained ISR control, custom route handlers give you more control. We went custom because we needed different revalidation intervals per content type and wanted the sitemap structure to match our GSC debugging workflow.
Do I submit each individual sitemap file to Google Search Console?
No. Submit only the sitemap index URL (e.g., https://yourdomain.com/sitemap.xml). Google reads the index and automatically discovers and processes all referenced sub-sitemaps. Submitting individual files is unnecessary and clutters your GSC dashboard.
How often should sitemaps be regenerated for large dynamic sites?
For most content-heavy sites, hourly regeneration via ISR (revalidate = 3600) is a good default. If you publish content very frequently, pair it with on-demand revalidation triggered by database webhooks. Don't regenerate on every request -- that defeats caching and increases Supabase load unnecessarily.
Why isn't Google indexing all my sitemap URLs?
The most common causes are: accidental noindex meta tags, robots.txt blocking, thin/duplicate content, orphan pages with no internal links, and crawl budget limitations. Check GSC's "Pages" report under "Why pages aren't indexed" for specific reasons. For large sites, focus on improving internal linking to orphan pages -- this is often the single biggest lever.
Does the priority value in sitemaps actually affect Google rankings?
Google has publicly stated they largely ignore priority and changefreq values. However, Bing and other search engines do use them. The lastmod field is the most important sitemap signal -- make sure it reflects actual content changes from your database, not the current timestamp.
How do I handle Supabase's 1,000 row limit for sitemap queries?
Use Supabase's .range(offset, offset + batchSize - 1) method to paginate in batches of 1,000. Loop until you've fetched all rows for the current sitemap chunk. For count-only queries (used in the sitemap index), use .select('*', { count: 'exact', head: true }) which returns just the count without transferring any row data.
Can this approach handle 500K or 1 million pages?
Yes, with minor adjustments. The chunked architecture scales linearly -- 1 million pages would produce about 20 sub-sitemaps. The main concern becomes Vercel's 60-second function timeout for generating individual 50K-URL sitemaps. If you hit that limit, reduce the chunk size to 25,000 or 10,000 URLs per file. The sitemap protocol allows up to 50,000 sitemaps in a single index, so you won't run into index-level limits anytime soon.