What is Crawl Budget?
Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe.
What is Crawl Budget?
Crawl budget is the number of URLs Googlebot (or any search engine crawler) will request from your site within a specific time window. Google defines it as the intersection of two factors: crawl rate limit (the maximum fetching rate that won't degrade your server performance) and crawl demand (how much Google actually wants to crawl based on popularity, staleness, and URL type). The concept was formally explained by Google's Gary Illyes in a January 2017 blog post on the Google Search Central blog. For most sites under 10,000 pages, crawl budget isn't a practical concern — Google will find everything. It becomes critical for large e-commerce sites, programmatic SEO builds, and any domain with hundreds of thousands of URLs. We've seen crawl budget problems tank indexing on sites with 500k+ faceted filter pages that should never have been crawlable in the first place.
How it works
When Googlebot visits your site, it doesn't crawl every URL every time. It allocates a finite number of requests per crawl session based on:
- Server health — If your server responds slowly (TTFB > 1–2 seconds consistently), Google throttles its crawl rate. Faster responses = more pages crawled per session.
- URL importance signals — Pages with more internal links, external backlinks, or fresher content get crawled more frequently.
- Crawl waste — Every URL that returns a soft 404, redirect chain, duplicate content, or parameterized junk page eats into your budget without adding value.
You can observe crawl behavior in Google Search Console under Settings → Crawl Stats. This report shows requests per day, average response time, and host status.
Here's a practical robots.txt pattern that protects crawl budget on a faceted e-commerce site:
User-agent: Googlebot
Disallow: /products?color=
Disallow: /products?size=
Disallow: /products?sort=
Allow: /products/
This tells Googlebot to skip parameterized filter URLs while still crawling clean product category and product detail pages. Combine this with <link rel="canonical"> tags and proper noindex meta tags on filtered pages for defense in depth. The XML sitemap is the other lever — only include URLs you genuinely want indexed. A sitemap stuffed with low-value pages actively misleads the crawler about your site's priorities.
When to use it
Crawl budget optimization matters when your site is large or generates many low-value URLs. Here's the breakdown:
Optimize crawl budget when:
- Your site has 50,000+ indexable URLs
- You run faceted navigation (filters, sorts, pagination) generating millions of URL permutations
- Google Search Console shows a growing gap between "Discovered — currently not indexed" and "Crawled — currently not indexed"
- Your server logs show Googlebot spending most of its time on pages you don't care about
- You're doing programmatic SEO and spinning up thousands of pages at once
Don't worry about crawl budget when:
- Your site has fewer than 10,000 pages
- Your pages are well-linked and your server responds in under 500ms
- Search Console shows all important pages as indexed
Wasting time on crawl budget for a 200-page marketing site is a misallocation of effort. Focus on content quality instead.
Crawl Budget vs alternatives
Crawl budget is often confused with related but distinct concepts:
| Concept | What it is | Relationship to Crawl Budget |
|---|---|---|
| Crawl Rate | Max requests/second Googlebot will make | One half of the crawl budget equation |
| Crawl Demand | Google's desire to crawl your URLs | The other half — driven by popularity and freshness |
| Index Bloat | Too many low-value pages in Google's index | A consequence of wasted crawl budget |
| Robots.txt | Server-level crawl directive file | A primary tool for managing crawl budget |
| XML Sitemap | List of URLs you want crawled/indexed | Guides crawler priority, doesn't guarantee crawling |
| Render Budget | Resources Google spends rendering JS pages | Separate concern, but compounds crawl budget issues on JS-heavy sites |
Index bloat and crawl budget waste usually travel together. If Googlebot is crawling junk URLs, it's also likely indexing some of them.
Real-world example
We worked on a Next.js e-commerce site with ~120,000 product pages and another ~2.4 million parameterized filter URLs (color, size, price range, sort order combinations). Google Search Console showed Googlebot was averaging 8,000 requests/day, but server logs revealed 70% of those requests hit filter pages — not product or category pages. We added robots.txt disallow rules for all filter parameters, submitted a cleaned-up sitemap with only product and category URLs, and added noindex, follow to remaining filter pages as a fallback. Within six weeks, Googlebot's daily crawl of actual product pages increased from ~2,400 to ~7,100, and the "Discovered — currently not indexed" queue dropped by 40,000 URLs over three months.