Programmatic SEO: How We Got 253K Pages Indexed with Next.js & Supabase
Last year, we crossed a milestone I genuinely didn't think was possible: 253,000 programmatic pages indexed across three production sites, all running on the same stack. Not a toy project. Not a demo. Real sites with real traffic and real revenue.
I'm going to walk you through exactly how we built Deluxe Astrology (91,000 pages), Not Another Sunday (137,000 venue listings), and HostList (25,000 hosting company profiles) -- including the Supabase queries, the Next.js page architecture, the data pipelines, and most importantly, what broke along the way. Because a lot broke.
Most programmatic SEO content online reads like someone skimmed the docs and called it a day. This isn't that. We've been shipping these pages for over a year, staring at Google Search Console graphs, cursing at crawl budget limits, and slowly figuring out what actually works at scale.

What Programmatic SEO Actually Means in 2025
Programmatic SEO is generating pages at scale from structured data using templates. That's the one-sentence version. The reality is much messier.
Google's stance in 2025 is clear but nuanced: they don't penalize programmatic content because it's programmatic. They penalize it when it's thin, duplicative, or unhelpful. The difference between Zapier's 70,000 indexed pages contributing to $140M ARR and a dev.to case study where 287,000 pages got near-zero indexation comes down to one thing -- whether each page genuinely answers a query that a human typed into a search bar.
Ahrefs data tells us 96.55% of all web pages get zero organic traffic. Programmatic SEO amplifies this problem if you're just spinning up variations of the same content. But it can also solve it spectacularly if your data is genuinely unique and your templates produce pages that are meaningfully different from each other.
Here's the mental model that works for us: every programmatic page should pass the "would I bookmark this?" test. If you landed on it from Google, would you stay? Would you find something you couldn't find elsewhere? If the answer is no, don't publish it.
The Three Projects: Production Numbers
Let me lay out what we actually built and what the numbers look like.
| Project | Pages | Content Types | Geographic Scope | Key Metric |
|---|---|---|---|---|
| Deluxe Astrology | 91,000 | Horoscopes, celebrity profiles, angel numbers, cosmic coins, gemstones, yoga poses, name lab, astrologer directory | 30 languages | 91K indexed pages |
| Not Another Sunday | 137,000 | Café & roaster venue listings with NRI score, photos, maps | USA, UK, Japan | 137K unique venue pages |
| HostList | 25,000 | Hosting company profiles with HostScore algorithm | 53 countries | 25K indexed profiles |
| Total | 253,000 |
Deluxe Astrology: 91K Pages Across 30 Languages
Deluxe Astrology started as a single-language horoscope site. The scale came from the intersection of content types and languages. Think about it: if you have 12 zodiac signs × 365 daily horoscopes × 30 languages, you're already at 131,000 potential pages from just one content type. We were selective -- not every combination gets a page -- but the combinatorial nature of astrology content is perfect for pSEO.
The celebrity profiles section alone has 28,840 records, each enriched via Claude to include natal chart analysis, personality breakdowns, and compatibility insights. More on that data pipeline later.
Not Another Sunday: 137K Venue Listings
Not Another Sunday is a specialty coffee discovery platform. Every café and roaster gets a unique page with a proprietary NRI (Neighbourhood Relevance Index) score, curated photos, an embedded map, opening hours, and reviews. We pull data from multiple APIs, user-generated content, and manual curation.
The key insight: no two venue pages look the same because no two venues are the same. The template is consistent, but the data fills it differently every time. A café in Shibuya with 4.8 NRI and latte art competitions looks nothing like a roaster in Brooklyn with 3.2 NRI and wholesale-only operations.
HostList: 25K Hosting Profiles Across 53 Countries
HostList catalogs hosting companies worldwide, each with a HostScore -- our algorithmic rating based on uptime data, pricing, support responsiveness, and user reviews. 25,000 profiles across 53 countries, each with unique performance data, pricing tables, and comparison widgets.
The Stack: Supabase, Next.js ISR, Vercel Edge
We standardized on the same stack across all three projects. Here's why each piece matters.
Supabase (PostgreSQL + pgvector): Our entire data layer lives in Supabase. PostgreSQL gives us the relational structure we need for complex queries (give me all Sagittarius celebrities born in December who are also musicians), and pgvector powers semantic search across content. Supabase's free tier handles 500MB; we're on Pro at $25/month per project for 8GB databases with unlimited API calls.
Next.js with ISR (Incremental Static Regeneration): Every page is statically generated at build time or on first request, then revalidated on a schedule. This means Google's crawler always hits a fast, pre-rendered HTML page -- not a loading spinner waiting for client-side JavaScript. We use the App Router with generateStaticParams for path generation.
Vercel Edge: Deployment, CDN, and edge middleware all in one. Vercel's Pro plan at $20/user/month gives us 1TB bandwidth, which handles the traffic from 253K pages comfortably. Edge Middleware handles geo-routing for Deluxe Astrology's 30-language setup.
The total infrastructure cost for all three projects runs about $150 -- 200/month. That's hosting 253,000 pages that get millions of monthly crawls. If you're building programmatic sites and considering our Next.js development capabilities or need help with headless CMS architecture, this is the stack we'd recommend.

Data Pipeline Architecture
The data is what makes or breaks programmatic SEO. Templates are easy. Getting genuinely unique, high-quality data for tens of thousands of pages? That's the hard part.
We use four data source types across our projects:
1. API Scraping
Not Another Sunday pulls venue data from Google Places API, Yelp Fusion API, and a handful of regional APIs for Japan. We run nightly sync jobs via Supabase Edge Functions that check for new venues, updated hours, and closed locations. Each API response gets normalized into our schema before insertion.
2. CSV Import with Validation
HostList's initial dataset came from a massive CSV of hosting companies compiled over two years. We built a validation pipeline that checks for duplicates, normalizes company names, and flags incomplete records. About 30% of the initial import got flagged and required manual review.
3. Claude AI Enrichment
This is where it gets interesting. For Deluxe Astrology, we had 28,840 celebrity records with basic biographical data -- name, birthday, birthplace. That's not enough for a useful page. We used Claude (Anthropic's API) to enrich each record with natal chart interpretation, personality analysis, career compatibility insights, and fun facts.
The key: we didn't use Claude to generate content from nothing. We used it to analyze and interpret real astronomical data. Each celebrity's natal chart is mathematically calculated from their birth data, then Claude provides the astrological interpretation. The underlying data is unique and verifiable. The AI layer adds depth, not fabrication.
Here's a simplified version of our enrichment pipeline:
import anthropic
from supabase import create_client
client = anthropic.Anthropic()
supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
def enrich_celebrity(record):
natal_chart = calculate_natal_chart(
birth_date=record['birth_date'],
birth_place=record['birth_place']
)
prompt = f"""Given this natal chart data for {record['name']}:
Sun: {natal_chart['sun_sign']} in {natal_chart['sun_house']}
Moon: {natal_chart['moon_sign']} in {natal_chart['moon_house']}
Rising: {natal_chart['ascendant']}
Write a 300-word astrological personality profile focusing on
how these placements manifest in their career as a {record['profession']}.
Include specific aspect interpretations."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
supabase.table('celebrities').update({
'natal_chart': natal_chart,
'ai_profile': response.content[0].text,
'enriched_at': 'now()'
}).eq('id', record['id']).execute()
We processed all 28,840 records over about a week, batching requests to stay within rate limits. Cost was roughly $180 in API credits. Not bad for enriching almost 29K pages with unique content.
4. User-Generated Content
Not Another Sunday accepts reviews and photo submissions from users. This UGC makes pages increasingly unique over time and signals to Google that the content is fresh and community-driven.
Page Template Architecture That Google Doesn't Hate
Here's where most programmatic SEO projects fail. They create a template like:
<h1>{City} {Service} Directory</h1>
<p>Looking for {service} in {city}? Browse our directory of {count} providers.</p>
That's thin content. Google knows it. Users know it. Don't do this.
Our template architecture ensures every page has five unique elements:
Unique H1: Not just
{name}inserted into a pattern. The H1 structure varies by content type and includes contextual modifiers.Unique meta description: Generated from the actual page data, not a template with blanks filled in.
Unique body content: This is the big one. Each page has 400-2,000 words of content that's specific to that entity. For celebrities, it's their natal chart analysis. For venues, it's their NRI breakdown, neighborhood context, and menu highlights. For hosting companies, it's their HostScore breakdown with specific uptime percentages and pricing.
Structured data (schema.org): Every page gets JSON-LD markup appropriate to its type --
Personfor celebrities,LocalBusinessfor venues,Organizationfor hosting companies.Internal linking: Each page links to 5-15 related pages based on actual data relationships. A celebrity page links to other celebrities with the same sun sign, same profession, or same birth year. A venue page links to nearby venues and venues with similar NRI scores.
The internal linking piece turned out to be the single most important factor for indexation. More on that in the fixes section.
Real Code: From Supabase Query to Rendered Page
Let me show you the actual flow for a Not Another Sunday venue page. This is production code, simplified slightly for readability.
First, the Supabase query layer:
// lib/queries/venues.ts
import { createClient } from '@supabase/supabase-js'
const supabase = createClient(
process.env.NEXT_PUBLIC_SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
)
export async function getVenueBySlug(slug: string) {
const { data, error } = await supabase
.from('venues')
.select(`
id, name, slug, description, nri_score,
address, city, country, lat, lng,
opening_hours, photos, menu_highlights,
created_at, updated_at,
venue_reviews (
id, rating, body, author_name, created_at
),
venue_tags (
tag:tags ( name, slug )
)
`)
.eq('slug', slug)
.eq('status', 'published')
.single()
if (error) throw error
return data
}
export async function getRelatedVenues(venueId: string, city: string, nriScore: number) {
const { data } = await supabase
.rpc('get_related_venues', {
p_venue_id: venueId,
p_city: city,
p_nri_score: nriScore,
p_limit: 12
})
return data ?? []
}
The get_related_venues function is a PostgreSQL function in Supabase that returns nearby venues sorted by NRI score proximity:
CREATE OR REPLACE FUNCTION get_related_venues(
p_venue_id UUID,
p_city TEXT,
p_nri_score NUMERIC,
p_limit INT DEFAULT 12
)
RETURNS TABLE (
id UUID, name TEXT, slug TEXT,
nri_score NUMERIC, city TEXT, country TEXT
) AS $$
BEGIN
RETURN QUERY
SELECT v.id, v.name, v.slug, v.nri_score, v.city, v.country
FROM venues v
WHERE v.id != p_venue_id
AND v.status = 'published'
AND v.city = p_city
ORDER BY ABS(v.nri_score - p_nri_score) ASC
LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;
Now the Next.js page component using App Router:
// app/venues/[country]/[city]/[slug]/page.tsx
import { getVenueBySlug, getRelatedVenues } from '@/lib/queries/venues'
import { VenueHeader } from '@/components/venue/VenueHeader'
import { NRIScoreCard } from '@/components/venue/NRIScoreCard'
import { VenueMap } from '@/components/venue/VenueMap'
import { ReviewSection } from '@/components/venue/ReviewSection'
import { RelatedVenues } from '@/components/venue/RelatedVenues'
import { venueJsonLd } from '@/lib/schema/venue'
import { notFound } from 'next/navigation'
export const revalidate = 3600 // ISR: revalidate every hour
export async function generateMetadata({ params }: Props) {
const venue = await getVenueBySlug(params.slug)
if (!venue) return {}
const reviewCount = venue.venue_reviews?.length ?? 0
const avgRating = reviewCount > 0
? (venue.venue_reviews.reduce((sum, r) => sum + r.rating, 0) / reviewCount).toFixed(1)
: null
return {
title: `${venue.name} -- Specialty Coffee in ${venue.city} | NRI ${venue.nri_score}`,
description: avgRating
? `${venue.name} in ${venue.city} scores ${venue.nri_score}/10 NRI. Rated ${avgRating}/5 from ${reviewCount} reviews. ${venue.description?.slice(0, 80)}...`
: `${venue.name} in ${venue.city} scores ${venue.nri_score}/10 on our Neighbourhood Relevance Index. ${venue.description?.slice(0, 100)}...`,
alternates: {
canonical: `/venues/${params.country}/${params.city}/${params.slug}`
}
}
}
export default async function VenuePage({ params }: Props) {
const venue = await getVenueBySlug(params.slug)
if (!venue) notFound()
const related = await getRelatedVenues(venue.id, venue.city, venue.nri_score)
return (
<>
<script
type="application/ld+json"
dangerouslySetInnerHTML={{ __html: JSON.stringify(venueJsonLd(venue)) }}
/>
<article>
<VenueHeader venue={venue} />
<NRIScoreCard score={venue.nri_score} breakdown={venue.nri_breakdown} />
<VenueMap lat={venue.lat} lng={venue.lng} />
<section className="venue-body">
<h2>About {venue.name}</h2>
<p>{venue.description}</p>
{venue.menu_highlights && (
<>
<h3>Menu Highlights</h3>
<ul>
{venue.menu_highlights.map(item => (
<li key={item}>{item}</li>
))}
</ul>
</>
)}
</section>
<ReviewSection reviews={venue.venue_reviews} />
<RelatedVenues venues={related} currentCity={venue.city} />
</article>
</>
)
}
Notice revalidate = 3600. This is ISR -- the page gets statically generated on the first request and cached for an hour. Google's crawler always gets fast HTML. Fresh data flows in on the next revalidation cycle. This matters enormously for crawl budget.
What Broke and How We Fixed It
Here's where most case studies get dishonest. They show the results without the months of debugging. We had three major issues.
Problem 1: Deluxe Astrology -- Crawl Budget Starvation
We launched with 91,000 pages and a flat sitemap structure. Google indexed about 12,000 pages in the first month and then... stopped. The GSC coverage report showed "Discovered -- currently not indexed" for tens of thousands of URLs.
The problem was twofold. First, our sitemap was a single file with 91,000 URLs. Google recommends max 50,000 per sitemap, but even within that limit, a single massive sitemap doesn't signal priority. Second, our internal linking was weak -- many pages were only reachable through the sitemap, not through on-page links.
The fix:
Sitemap restructuring: We broke the monolithic sitemap into category-based sitemaps.
sitemap-celebrities.xml,sitemap-horoscopes-en.xml,sitemap-horoscopes-es.xml, etc. Each under 10,000 URLs.Internal linking overhaul: We added contextual cross-links on every page. Celebrity pages now link to related celebrities (same zodiac, same profession, same birth year). Horoscope pages link to the celebrity profiles for that sign. Every page connects to at least 8 other pages.
Thin page removal: We killed about 4,000 pages that had less than 200 words of unique content. These were mostly auto-generated combination pages that didn't add value. Fewer pages, but higher quality.
After these changes, indexation climbed from 12K to 91K over about 10 weeks. The internal linking was the biggest lever.
Problem 2: HostList -- ISR Misconfiguration
HostList launched with export const dynamic = 'force-dynamic' on every page. This meant every single request -- including every Googlebot crawl -- hit Supabase in real-time. With Google crawling thousands of pages per day, our Supabase instance was getting hammered, response times spiked, and some pages timed out during crawls.
The fix: We switched to export const revalidate = 3600. Pages get statically cached and serve in under 100ms. Supabase only gets hit once per hour per page instead of once per request. Our p95 response time dropped from 2.8 seconds to 47 milliseconds. Googlebot started crawling 3x more pages per day because it wasn't waiting around.
Problem 3: Not Another Sunday -- Duplicate Content Across Countries
Some café chains operate in multiple countries. Starbucks Reserve in Tokyo and Starbucks Reserve in London initially had very similar page content because the template emphasized brand information over location-specific data.
The fix: We weighted location-specific content much higher. Neighborhood descriptions, nearby venue comparisons, local review sentiment, and country-specific pricing now make up 70%+ of each page. The brand info is a small section. Google stopped flagging these as near-duplicates.
Results: The Hockey Stick and the Honest Failures
The combined GSC data across all three projects shows the classic hockey stick curve -- flat for weeks, then exponential growth as Google's crawler gained confidence in our domains.
| Metric | Month 1 | Month 3 | Month 6 | Month 12 |
|---|---|---|---|---|
| Total indexed pages | 18,200 | 67,000 | 189,000 | 253,000 |
| Daily organic clicks | 340 | 2,100 | 8,400 | 19,600 |
| Avg. position (all queries) | 42 | 28 | 16 | 11 |
| Crawl requests/day (all sites) | 4,200 | 12,800 | 31,000 | 48,000 |
| Monthly Supabase cost | $75 | $75 | $125 | $150 |
| Monthly Vercel cost | $40 | $60 | $60 | $60 |
But let me be honest about the failures too. About 8% of our pages still sit in "Discovered -- currently not indexed" after 12 months. These tend to be the lowest-traffic-potential pages in the long tail -- specific angel number pages in low-search-volume languages, or hosting companies in small markets. We could probably force-index them with more internal links, but the ROI isn't there.
We also had a period around month 4 where Deluxe Astrology's traffic dropped 30% after a Google core update. It recovered over 6 weeks without any changes on our end, but those were stressful weeks. Programmatic sites seem more volatile during core updates because Google re-evaluates quality signals across the entire page corpus at once.
If you're considering building something at this scale, we've detailed our approach and pricing at our pricing page. For Astro-based static site generation -- which we've also experimented with for pure-static pSEO -- check our Astro development capabilities.
Programmatic SEO vs. Traditional Content: When to Use Which
Programmatic SEO isn't a replacement for editorial content. It's a different tool for a different job.
| Factor | Programmatic SEO | Traditional Content |
|---|---|---|
| Best for | Data-driven queries ("best cafes in Shibuya", "Leo horoscope today") | Intent-driven queries ("how to brew pour-over coffee") |
| Content uniqueness | Comes from unique data per page | Comes from unique perspective/research |
| Scaling speed | 1,000+ pages per week | 2-5 articles per week |
| Maintenance burden | Database updates, template fixes | Periodic content refresh |
| Google trust building | Slower (needs to prove quality at scale) | Faster (each piece judged individually) |
| Risk profile | Higher (thin content penalties affect entire site) | Lower (one bad article doesn't tank the domain) |
The sweet spot is combining both. Not Another Sunday has 137K programmatic venue pages and 200+ editorial guides about coffee culture, brewing methods, and city-specific café crawl routes. The editorial content builds E-E-A-T signals that lift the entire domain, which helps the programmatic pages index faster.
FAQ
How many pages can you realistically get indexed with programmatic SEO?
It depends entirely on domain authority and content quality. On established domains with strong backlink profiles, we've seen 90%+ indexation rates for 100K+ pages. New domains struggle -- the dev.to case study of 287K pages on a fresh domain getting near-zero indexation is the norm, not the exception. Start with 1,000-5,000 high-quality pages, build authority, then scale.
What's the minimum content per page to avoid thin content penalties?
We aim for at least 400 words of unique content per page, plus structured data, images, and internal links. But word count alone isn't the metric -- it's about whether the page answers the user's query better than what already exists. A 200-word page with unique data tables and a map can outperform a 2,000-word page of generic text.
Is programmatic SEO still safe after Google's 2025 helpful content updates?
Yes, but only if you're genuinely creating useful pages. Google's 2025 updates specifically target low-quality programmatic content that exists only to capture search traffic without providing value. Sites like Zapier (70K pages, $140M ARR) continue to thrive because their pages solve real problems. The sites getting penalized are the ones generating variations of "Best {service} in {city}" with no real data behind them.
How much does a programmatic SEO stack with Supabase and Vercel cost?
Our three-project stack runs about $150-200/month total. Supabase Pro is $25/month per project (we use three instances). Vercel Pro is $20/user/month. The AI enrichment via Claude's API was a one-time cost of about $180 for 28,840 records. For most projects under 50K pages, expect $50-100/month in infrastructure costs.
How long does it take for Google to index programmatic pages?
Expect 2-4 weeks for initial crawling of your sitemaps, but full indexation of large page sets takes 3-6 months. Our experience shows a hockey stick pattern: slow crawling for the first 6-8 weeks as Google evaluates quality, then rapid acceleration once it decides your content is worth indexing. Internal linking and sitemap structure dramatically affect this timeline.
Should I use Next.js SSR or ISR for programmatic SEO pages?
ISR, almost always. SSR (force-dynamic) means every crawler request hits your database, which creates performance problems at scale and wastes crawl budget on slow responses. ISR with revalidate = 3600 (or even 86400 for daily updates) gives you static-site performance with dynamic data freshness. We learned this the hard way with HostList -- switching from force-dynamic to ISR dropped our response time from 2.8s to 47ms.
How do you handle internal linking across 100K+ pages?
Database-driven related content queries. Every page runs a query that finds 8-15 related pages based on actual data relationships -- same category, similar scores, geographic proximity, shared attributes. Don't just randomly link to pages. The links need to make contextual sense to both users and Google. We use PostgreSQL functions in Supabase to compute these relationships efficiently.
What's the biggest mistake people make with programmatic SEO?
Focusing on page count instead of page quality. It's tempting to generate every possible combination of your data, but 10,000 excellent pages will outperform 100,000 mediocre ones every time. We killed 4,000 thin pages on Deluxe Astrology and saw indexation increase across the remaining pages. Google interprets thin pages as a signal that your entire site might be low quality. If you're ready to build programmatic pages the right way, reach out to our team -- we've learned these lessons so you don't have to.