How to Migrate a 30,000-Page Website Without Losing SEO
Last year, we migrated a 34,000-page e-commerce site from a monolithic WordPress installation to a headless architecture using Next.js and a headless CMS. The client's organic traffic accounted for 72% of their revenue. No pressure, right?
The migration took 14 weeks of planning and 6 weeks of execution. When we flipped the switch, organic traffic dipped 3.2% in week one, recovered by week three, and was up 11% by month two. That's not luck -- it's process.
I've seen migrations go catastrophically wrong. A competitor of that same client had migrated six months earlier and lost 40% of their organic traffic overnight. Eight months later, they still hadn't recovered. The difference between a successful large-scale migration and a disaster comes down to preparation, redirect management, and having a rollback plan you actually trust.
This article walks through everything we do when migrating sites with tens of thousands of pages. It's the same process whether you're moving from WordPress to Next.js, Drupal to Astro, or any other platform shift.
Table of Contents
- Why Large-Scale Migrations Fail
- Phase 1: Pre-Migration Audit and Crawl
- Phase 2: URL Mapping and Redirect Strategy
- Phase 3: Technical SEO Parity Checklist
- Phase 4: Content Migration and Validation
- Phase 5: Staging Environment Testing
- Phase 6: Launch Day Execution
- Phase 7: Post-Migration Monitoring
- Redirect Implementation at Scale
- Handling International and Multi-Language Sites
- Common Mistakes That Kill Rankings
- Tools and Stack We Use
- FAQ

Why Large-Scale Migrations Fail
Most migration failures share the same root causes. Understanding them upfront saves you from joining the graveyard of botched launches.
The Redirect Problem
On a 500-page site, you can manually map every URL. On a 30,000-page site, you can't. Teams end up writing regex-based redirect rules that cover 90% of URLs and assume the remaining 10% will sort itself out. That remaining 10%? It's 3,000 pages. Many of which are your highest-performing content.
A 2025 Ahrefs study found that sites losing more than 15% of their indexed pages during migration experienced an average organic traffic decline of 34%. And recovery took 4-8 months on average.
The Parity Problem
Google doesn't just care about content -- it cares about structure. Internal linking patterns, heading hierarchies, structured data, canonical tags, pagination handling, faceted navigation. Change too many of these simultaneously and Google essentially has to re-evaluate your entire site from scratch.
The Timing Problem
I've seen teams spend months perfecting the new site and then rush the actual migration because leadership is impatient. You don't migrate a 30,000-page site on a Friday afternoon. You don't migrate during your peak traffic season. And you definitely don't migrate without a tested rollback plan.
Phase 1: Pre-Migration Audit and Crawl
Before you touch anything, you need a complete picture of what exists today. This is your baseline, and you'll reference it constantly throughout the migration.
Full Site Crawl
Run a complete crawl using Screaming Frog, Sitebulb, or a cloud-based crawler like Lumar (formerly Deepcrawl). For 30,000+ pages, you'll want the cloud option -- desktop crawlers choke on sites this size, and you need the crawl data to be shareable across your team.
Capture everything:
- Every URL and its HTTP status code
- Title tags and meta descriptions
- H1 tags
- Canonical tags
- Hreflang tags (if applicable)
- Internal links (both inbound and outbound per page)
- Structured data types present
- Page load times
- Word count per page
- Images and alt text
Analytics Baseline
Export the last 12 months of Google Analytics data and Google Search Console data. You need:
- Top 1,000 landing pages by organic sessions
- Top 5,000 queries by clicks and impressions
- Crawl stats (pages crawled per day, response times)
- Core Web Vitals scores
- Index coverage report (indexed, excluded, errors)
Tag your top 500 organic landing pages. These are the pages that cannot break. Period. Every one of them gets individually verified during and after migration.
Backlink Audit
Pull backlink data from Ahrefs, Semrush, and Google Search Console. Cross-reference to find every URL that has external links pointing to it. These URLs need perfect 301 redirects -- losing backlink equity on high-authority pages is one of the fastest ways to tank rankings.
# Example: Export and deduplicate backlinked URLs
ahrefs-export.csv + semrush-export.csv + gsc-export.csv
| sort -u
| awk -F',' '{print $1}'
> unique_backlinked_urls.txt
wc -l unique_backlinked_urls.txt
# Output: 8,247 unique URLs with backlinks
Phase 2: URL Mapping and Redirect Strategy
This is where migrations are won or lost. On a 30,000-page site, you need a systematic approach that combines automated mapping with manual verification for critical pages.
Building the Redirect Map
Start by categorizing your URLs into patterns. Most large sites have a relatively small number of URL patterns that account for the majority of pages:
| URL Pattern | Example | Page Count | Strategy |
|---|---|---|---|
| Product pages | /products/blue-widget-123 |
18,000 | Regex + ID mapping |
| Category pages | /category/widgets |
450 | Manual mapping |
| Blog posts | /blog/2024/03/post-title |
3,200 | Slug preservation |
| Tag/filter pages | /products?color=blue |
6,500 | Evaluate: redirect or noindex |
| Static pages | /about, /contact |
85 | Manual mapping |
| Paginated pages | /category/widgets/page/3 |
1,800 | Map to new pagination |
The Three-Tier Approach
Tier 1: Manual mapping (top 500 pages) Your highest-traffic, highest-revenue pages get individually mapped. A human verifies each redirect. No exceptions.
Tier 2: Pattern-based mapping (next ~25,000 pages) Write transformation rules that convert old URL patterns to new ones. Test these rules against your full URL list before deployment.
# Example redirect rule generation
import csv
import re
def generate_redirect(old_url):
# Product pages: /products/blue-widget-123 -> /shop/blue-widget
product_match = re.match(r'/products/([a-z-]+)-(\d+)$', old_url)
if product_match:
slug = product_match.group(1)
return f'/shop/{slug}', 301
# Blog posts: /blog/2024/03/post-title -> /blog/post-title
blog_match = re.match(r'/blog/\d{4}/\d{2}/(.+)$', old_url)
if blog_match:
slug = blog_match.group(1)
return f'/blog/{slug}', 301
return None, None
# Process all URLs
with open('all_urls.csv') as f:
reader = csv.reader(f)
unmapped = []
for row in reader:
old_url = row[0]
new_url, status = generate_redirect(old_url)
if new_url is None:
unmapped.append(old_url)
print(f"Unmapped URLs: {len(unmapped)}")
Tier 3: Remaining unmapped pages (~4,500 pages) These are your edge cases. Go through them manually. Some will be pages you're intentionally sunsetting (redirect to nearest relevant page). Some will be URLs you missed in your pattern analysis. Don't leave any 404s for pages that had traffic or backlinks.
Redirect Chains and Loops
If the old site already has redirects in place, your new redirects might create chains (A → B → C). Resolve these before launch. Every redirect should go directly from old URL to final destination in a single hop. Redirect chains bleed PageRank -- Google's John Mueller has confirmed multiple times that while they'll follow chains, a direct redirect is always preferable.

Phase 3: Technical SEO Parity Checklist
The new site needs to maintain technical SEO parity with the old site -- and ideally improve on it. Here's what we check:
Critical Parity Items
- Title tags: Same or improved. Never leave them blank during migration.
- Meta descriptions: Carry them over, even if you plan to rewrite later.
- H1 structure: One H1 per page, matching the old site's keyword targeting.
- Canonical tags: Self-referencing canonicals on every page. If the old site had cross-domain canonicals, preserve them.
- Robots.txt: Don't accidentally block Googlebot on launch. I've seen this happen more than I'd like to admit.
- XML Sitemaps: Generate new sitemaps with all new URLs. Submit within hours of launch.
- Structured data: Migrate all schema markup. Product schema, FAQ schema, breadcrumb schema -- all of it.
- Internal linking: The new site's internal link graph should closely mirror the old site's.
Performance Requirements
Google's Core Web Vitals are ranking factors. Your new site should meet or beat the old site's performance:
| Metric | Good Threshold | Target |
|---|---|---|
| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 2.0s |
| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 150ms |
| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.05 |
| TTFB (Time to First Byte) | ≤ 800ms | ≤ 400ms |
This is one area where migrating to a modern stack like Next.js or Astro actually gives you an advantage. Static generation and edge rendering can dramatically improve TTFB. We've seen TTFB drop from 1.2s to under 200ms when moving from traditional WordPress to Next.js with ISR or Astro with static output.
Phase 4: Content Migration and Validation
Automated Content Extraction
For 30,000 pages, you need automated content extraction. We typically build custom scrapers or use the CMS's export APIs to pull content into a structured format (usually JSON or CSV) before importing into the new headless CMS.
Key validations after import:
- Character encoding (watch for broken special characters)
- Image references (do all images resolve?)
- Internal links (are they updated to new URL patterns?)
- Embedded media (videos, iframes, widgets)
- Table formatting
- Code blocks
Content Diff Testing
We run automated comparisons between old and new pages for our top 500 URLs. The script fetches both versions, strips HTML, and compares the text content. Any page with less than 95% text similarity gets flagged for manual review.
// Simplified content comparison
const { diff } = require('fast-diff');
const cheerio = require('cheerio');
async function comparePages(oldUrl, newUrl) {
const oldHtml = await fetch(oldUrl).then(r => r.text());
const newHtml = await fetch(newUrl).then(r => r.text());
const oldText = cheerio.load(oldHtml)('main').text().trim();
const newText = cheerio.load(newHtml)('main').text().trim();
const changes = diff(oldText, newText);
const unchanged = changes
.filter(([type]) => type === 0)
.reduce((sum, [, text]) => sum + text.length, 0);
const similarity = unchanged / Math.max(oldText.length, newText.length);
return {
similarity: Math.round(similarity * 100),
oldLength: oldText.length,
newLength: newText.length,
needsReview: similarity < 0.95
};
}
Phase 5: Staging Environment Testing
Never launch a migration without thorough staging testing. Here's what we validate:
Redirect Testing
Test every single redirect. Yes, all 30,000. Use a script that follows the redirect chain and validates the final destination:
# Test redirects from mapping file
while IFS=, read -r old_url new_url; do
response=$(curl -s -o /dev/null -w "%{http_code} %{redirect_url}" "$old_url")
status=$(echo $response | cut -d' ' -f1)
redirect=$(echo $response | cut -d' ' -f2)
if [ "$status" != "301" ] || [ "$redirect" != "$new_url" ]; then
echo "FAIL: $old_url -> $status $redirect (expected 301 $new_url)"
fi
done < redirect_map.csv
Rendering Validation
If you're using client-side rendering (CSR) or hydration-heavy approaches, verify that Googlebot can actually see your content. Use Google's Rich Results Test or the URL Inspection tool in Search Console to check rendered output.
This is a particularly common issue with React-based frameworks. If your content requires JavaScript to render and you haven't implemented SSR or SSG properly, Google might see a blank page. We always use server-side rendering or static generation for SEO-critical pages.
Phase 6: Launch Day Execution
The Launch Checklist
- DNS TTL: Lower DNS TTL to 300 seconds at least 48 hours before migration
- Deploy redirects: Get all 301 redirects live on the old server/CDN
- Switch DNS: Point domain to new infrastructure
- Verify redirects: Run automated redirect tests against production
- Submit sitemaps: Submit new XML sitemaps in Google Search Console
- Request indexing: Use the URL Inspection tool to request indexing of your top 50 pages
- Monitor: Watch real-time analytics for anomalies
- Verify robots.txt: Confirm Googlebot isn't blocked
- Check CDN/caching: Ensure redirect headers aren't being cached incorrectly
Timing
Launch on a Tuesday or Wednesday morning. Never Friday. You want at least 3 full business days to monitor and fix issues before the weekend. Avoid launching during high-traffic periods or major shopping events.
We also make sure someone is monitoring through the night after launch. Google often crawls more aggressively during off-peak hours, and if your redirects have issues, you want to catch them fast.
Rollback Plan
Have a tested rollback plan that can be executed in under 15 minutes. This usually means keeping the old infrastructure running in parallel for at least 2 weeks post-migration. The cost of maintaining two environments temporarily is nothing compared to the cost of a failed migration.
Phase 7: Post-Migration Monitoring
Daily Monitoring (Weeks 1-2)
- Crawl errors: Check Google Search Console daily for new 404s and server errors
- Index coverage: Monitor the index coverage report for drops
- Organic traffic: Compare daily organic sessions to your baseline
- Rankings: Track your top 200 keywords daily
- Server logs: Analyze Googlebot's crawl patterns on the new site
- Core Web Vitals: Verify field data as it starts coming in
Weekly Monitoring (Weeks 3-8)
- Compare organic traffic week-over-week
- Monitor for ranking volatility
- Check for new crawl issues
- Verify redirect chains haven't been accidentally created
- Monitor backlink profile for lost links
Expected Traffic Patterns
A well-executed migration typically shows:
- Week 1: 5-15% traffic dip (Google is processing the changes)
- Week 2-3: Recovery to pre-migration levels
- Week 4-8: If the new site is technically superior, you'll often see a traffic increase
If you see a 30%+ drop that doesn't recover by week 3, something went wrong with your redirects or technical implementation. Dig into Search Console immediately.
Redirect Implementation at Scale
Where you implement redirects matters. For 30,000+ redirects, don't stuff them all into an .htaccess file or a Next.js redirects config array -- that kills performance.
Recommended Approaches
Edge-level redirects (best for performance)
Implement redirects at the CDN/edge level using Cloudflare Workers, Vercel Edge Middleware, or Netlify's _redirects file. Edge redirects execute before your application code, so they're extremely fast.
// Vercel Edge Middleware example
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
// Load redirect map (pre-built at deploy time)
import redirectMap from './redirects.json';
export function middleware(request: NextRequest) {
const path = request.nextUrl.pathname;
const redirect = redirectMap[path];
if (redirect) {
return NextResponse.redirect(
new URL(redirect.destination, request.url),
redirect.permanent ? 301 : 302
);
}
return NextResponse.next();
}
Database-backed redirects (best for flexibility) Store redirects in a database and look them up at request time. This lets you add, modify, and audit redirects without redeploying. Add aggressive caching (Redis or similar) so the database lookup doesn't add latency.
Hybrid approach (what we usually do) Pattern-based redirects at the edge, individual redirects in a database. Best of both worlds.
Handling International and Multi-Language Sites
If your 30,000-page site includes multiple languages or regions, the complexity multiplies. Each language version needs its own redirect map. Hreflang tags need to be updated to reference new URLs. And you need to verify that the language/region targeting in Search Console still works correctly.
Common pitfalls:
- Forgetting to update hreflang annotations across all language versions simultaneously
- Breaking the hreflang reciprocal requirement (if page A points to page B, page B must point back to page A)
- Losing language-specific URL structures that Google uses as signals
Common Mistakes That Kill Rankings
- Using 302 instead of 301: Temporary redirects don't pass full link equity. Triple-check your redirect status codes.
- Blocking the staging site and forgetting to unblock: Your
robots.txton staging saysDisallow: /. You deploy staging to production. Googlebot can't crawl anything. - Changing content and URLs simultaneously: Google sees a new URL with different content. Is it a new page? A moved page? Reduce ambiguity -- migrate URLs first, change content later.
- Redirecting everything to the homepage: Lazy redirect implementations that send all old URLs to the homepage destroy your long-tail rankings instantly.
- Ignoring JavaScript rendering: Your new React app looks great in Chrome. Googlebot sees an empty
<div id="root"></div>. - Not handling trailing slashes consistently:
/products/widgetand/products/widget/are different URLs. Pick one and redirect the other. - Removing pages without redirects: If a page had traffic, it needs a redirect. Even if you're sunsetting that content, redirect to the nearest relevant page.
Tools and Stack We Use
| Tool | Purpose | Cost (2026) |
|---|---|---|
| Screaming Frog | Desktop crawling | $259/year |
| Lumar (Deepcrawl) | Cloud crawling for large sites | Custom pricing |
| Ahrefs | Backlink analysis, rank tracking | From $129/month |
| Google Search Console | Index monitoring, crawl stats | Free |
| Redirectchecker.com | Bulk redirect testing | Free tier available |
| ContentKing | Real-time SEO monitoring | From $99/month |
| Custom Python/Node scripts | Redirect mapping, content diffing | Your time |
For the actual site build, we typically use Next.js or Astro depending on the project's needs, paired with a headless CMS like Sanity, Contentful, or Storyblok. If you're planning a migration and want to discuss architecture, check our pricing or get in touch.
FAQ
How long does it take to migrate a 30,000-page website?
Expect 12-20 weeks total. The planning and URL mapping phase takes the longest -- usually 8-14 weeks. The actual technical migration and launch is typically 4-6 weeks. Rushing the planning phase is the single biggest predictor of migration failure.
Will I definitely lose some SEO traffic during migration?
A temporary dip of 5-15% is normal and expected, even with a perfect migration. Google needs time to process tens of thousands of redirects and re-crawl your new site. The dip typically resolves within 2-3 weeks. If you see a larger drop or it doesn't recover, investigate your redirects and technical implementation immediately.
Should I change my URL structure during migration?
Only if there's a strong reason to do so. Every URL change adds risk. If your current URL structure is functional and descriptive, keep it. If it's genuinely bad (e.g., URLs with query parameters instead of clean paths), the migration is a good opportunity to fix it -- but plan your redirect map accordingly.
Can I migrate my site in phases instead of all at once?
Yes, and for very large sites it's often the safer approach. You can migrate section by section -- blog first, then product pages, then category pages. This reduces risk but increases complexity because you're running two platforms simultaneously, usually behind a reverse proxy. We've done this successfully several times, but it requires careful routing configuration.
What happens to my Google Ads during migration?
Update your ad landing page URLs to the new URLs before or immediately after migration. If you have redirects in place, your ads will still work, but the redirect adds latency and Google Ads quality scores can be negatively affected by redirect chains. Updating the URLs directly is always better.
How do I handle pages I want to remove during migration?
If the page had organic traffic or backlinks, redirect it to the most relevant existing page on the new site. If it had neither, you can let it return a 404 or 410 (Gone) status. Don't redirect irrelevant pages to your homepage -- Google treats mass homepage redirects as soft 404s.
Should I use 301 or 308 redirects?
Use 301 for most cases. Both are permanent redirects, but 301 is universally understood by all bots and browsers. 308 preserves the HTTP method (POST stays POST), which matters for API endpoints but not for SEO-focused page redirects.
When should I remove the old redirects?
Keep them for at least one year, preferably indefinitely. Redirects are cheap to maintain, and removing them means any old bookmarks, external links, or cached search results will hit 404s. There's almost never a good reason to remove working 301 redirects.