You're deploying on a Friday afternoon (I know, I know), everything looks good, and then your monitoring lights up like a Christmas tree. Users are getting 429 errors. Your API is rejecting requests. Or maybe it's the other way around — you're calling a third-party API and they're rejecting you. Either way, the HTTP 429 Too Many Requests status code just became the most important thing in your day.

I've been on both sides of this. I've been the developer accidentally DDoS-ing a CMS API because of a misconfigured build process, and I've been the one implementing rate limiting to protect our own servers from runaway clients. Both experiences taught me things the docs don't cover. Let's walk through all of it.

Table of Contents

HTTP 429 Too Many Requests: Causes, Fixes, and Rate Limiting

What Does HTTP 429 Actually Mean?

HTTP 429 is defined in RFC 6585, published in 2012. The spec is surprisingly short. Here's the gist: the user (or client) has sent too many requests in a given amount of time.

That's it. It's a rate limiting response. The server is saying, "I understood your request, it's probably valid, but you need to slow down."

This is different from a 403 Forbidden (you're not allowed) or a 503 Service Unavailable (the whole server is struggling). A 429 is targeted. It's about your request rate specifically.

The response SHOULD include a Retry-After header telling the client how long to wait before trying again. I said "should" because plenty of APIs don't bother, which makes everyone's life harder.

Where You'll See 429s in the Wild

  • Third-party APIs: Stripe, OpenAI, GitHub, Contentful, Sanity — they all have rate limits
  • CDNs and hosting platforms: Vercel, Cloudflare, and AWS will return 429s if you hit their edge rate limits
  • Your own APIs: If you've implemented rate limiting (and you should have)
  • Build processes: Static site generation that hits a CMS API for every page can easily trigger rate limits
  • Web scraping: If you're fetching data from external sources aggressively

Common Causes of 429 Errors

Let me break down the scenarios I've actually encountered in production, ranked roughly by how often they come up.

1. Static Site Builds Hammering a Headless CMS

This is the one that bites teams working with headless architectures the most. You've got a site with 2,000 pages, each needing data from your CMS. Your build process fires off all those requests in parallel, the CMS sees a massive spike, and starts returning 429s. Your build fails.

We see this regularly when working on headless CMS projects. The fix involves request queuing and concurrency limits, which I'll cover below.

2. Missing or Broken Caching

If every page load triggers a fresh API call because your caching layer isn't working, you'll hit rate limits fast — especially with traffic spikes. I once debugged a Next.js app where revalidate was accidentally set to 0, meaning ISR was effectively disabled. Every visitor triggered a new API call to Contentful. It took about 45 minutes of real traffic to start getting 429s.

3. Retry Loops Without Backoff

Your code gets an error, retries immediately, gets another error, retries immediately... congratulations, you've built a rate-limit-triggering machine. I've seen this pattern in webhook handlers, background jobs, and even client-side fetch calls.

4. Multiple Services Sharing an API Key

Your staging environment, your production environment, your local dev setup, and your CI/CD pipeline are all using the same API key. Each one looks fine individually, but collectively they're burning through your rate limit budget.

5. Client-Side Fetch Without Debouncing

A search-as-you-type feature that fires an API call on every keystroke. A dashboard that polls every 500ms. An infinite scroll that triggers fetches faster than the user can scroll. These patterns can absolutely trigger 429s, especially when multiplied across all your users.

6. Actual Abuse or Attack

Sometimes a 429 is doing exactly what it should — protecting your server from someone sending an unreasonable number of requests. Bots, credential stuffing, scraping — rate limiting is your first line of defense.

The Retry-After Header Explained

The Retry-After header is the server's way of telling you exactly when to try again. It can come in two formats:

Seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Specific date/time:

HTTP/1.1 429 Too Many Requests
Retry-After: Thu, 01 Jan 2026 00:00:00 GMT

The seconds format is far more common. The date format uses HTTP-date as defined in RFC 7231.

Here's what most tutorials won't tell you: many APIs don't send Retry-After at all, or they send it inconsistently. OpenAI's API generally includes it. GitHub's API includes it along with X-RateLimit-Reset. Plenty of smaller APIs just send a naked 429 and leave you guessing.

Some APIs also send additional rate limit headers:

Header Purpose Example
X-RateLimit-Limit Max requests allowed per window 100
X-RateLimit-Remaining Requests remaining in current window 0
X-RateLimit-Reset Unix timestamp when the window resets 1735689600
Retry-After Seconds to wait before retrying 30

Always check for these headers. They let you implement smarter retry logic and even proactively slow down before you hit the limit.

HTTP 429 Too Many Requests: Causes, Fixes, and Rate Limiting - architecture

How to Handle 429 Errors as a Client

When you're the one receiving 429 errors, here's how to handle them properly.

Exponential Backoff with Jitter

This is the gold standard. Don't just wait a fixed amount of time — increase the delay exponentially with each retry, and add some randomness (jitter) to prevent thundering herd problems.

async function fetchWithRetry(
  url: string,
  options: RequestInit = {},
  maxRetries: number = 5
): Promise<Response> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    if (attempt === maxRetries) {
      throw new Error(`Still getting 429 after ${maxRetries} retries`);
    }

    // Check for Retry-After header first
    const retryAfter = response.headers.get('Retry-After');
    let delay: number;

    if (retryAfter) {
      // Could be seconds or a date
      const parsed = parseInt(retryAfter, 10);
      if (!isNaN(parsed)) {
        delay = parsed * 1000;
      } else {
        delay = new Date(retryAfter).getTime() - Date.now();
      }
    } else {
      // Exponential backoff with jitter
      const baseDelay = Math.pow(2, attempt) * 1000;
      const jitter = Math.random() * 1000;
      delay = baseDelay + jitter;
    }

    console.log(`Rate limited. Waiting ${Math.round(delay / 1000)}s before retry ${attempt + 1}`);
    await new Promise(resolve => setTimeout(resolve, delay));
  }

  // TypeScript wants this, though we'll never reach it
  throw new Error('Unexpected end of retry loop');
}

Request Queuing for Build Processes

For static site generation where you need to make hundreds or thousands of API calls, use a queue with concurrency control:

import pLimit from 'p-limit';

// Limit to 5 concurrent requests
const limit = pLimit(5);

const pages = await getAllPageSlugs(); // Returns ['/', '/about', '/blog/post-1', ...]

const results = await Promise.all(
  pages.map(slug =>
    limit(() => fetchWithRetry(`https://api.cms.com/pages/${slug}`))
  )
);

The p-limit library (2.5M+ weekly npm downloads in 2025) is my go-to for this. You can also add a delay between requests:

const limit = pLimit(3);

const delay = (ms: number) => new Promise(r => setTimeout(r, ms));

const results = await Promise.all(
  pages.map((slug, i) =>
    limit(async () => {
      if (i > 0) await delay(200); // 200ms between requests
      return fetchWithRetry(`https://api.cms.com/pages/${slug}`);
    })
  )
);

Implementing Rate Limiting in Next.js API Routes

Now let's flip to the other side — you're building an API and need to protect it. If you're building with Next.js, here's how to add rate limiting to your API routes.

Simple In-Memory Rate Limiter

For a single-server deployment or during development, this works:

// lib/rate-limit.ts
type RateLimitEntry = {
  count: number;
  resetTime: number;
};

const rateLimitMap = new Map<string, RateLimitEntry>();

export function rateLimit({
  windowMs = 60 * 1000,
  maxRequests = 100,
}: {
  windowMs?: number;
  maxRequests?: number;
} = {}) {
  return function check(identifier: string): {
    allowed: boolean;
    remaining: number;
    resetIn: number;
  } {
    const now = Date.now();
    const entry = rateLimitMap.get(identifier);

    if (!entry || now > entry.resetTime) {
      rateLimitMap.set(identifier, {
        count: 1,
        resetTime: now + windowMs,
      });
      return { allowed: true, remaining: maxRequests - 1, resetIn: windowMs };
    }

    if (entry.count >= maxRequests) {
      return {
        allowed: false,
        remaining: 0,
        resetIn: entry.resetTime - now,
      };
    }

    entry.count++;
    return {
      allowed: true,
      remaining: maxRequests - entry.count,
      resetIn: entry.resetTime - now,
    };
  };
}

Using it in a Next.js App Router API route:

// app/api/data/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { rateLimit } from '@/lib/rate-limit';

const limiter = rateLimit({ windowMs: 60_000, maxRequests: 30 });

export async function GET(request: NextRequest) {
  const ip = request.headers.get('x-forwarded-for') ?? 'anonymous';
  const { allowed, remaining, resetIn } = limiter(ip);

  if (!allowed) {
    return NextResponse.json(
      { error: 'Too many requests. Please slow down.' },
      {
        status: 429,
        headers: {
          'Retry-After': String(Math.ceil(resetIn / 1000)),
          'X-RateLimit-Limit': '30',
          'X-RateLimit-Remaining': '0',
        },
      }
    );
  }

  // Your actual route logic here
  return NextResponse.json(
    { data: 'Here you go' },
    {
      headers: {
        'X-RateLimit-Limit': '30',
        'X-RateLimit-Remaining': String(remaining),
      },
    }
  );
}

Production Rate Limiting with Upstash Redis

The in-memory approach breaks down when you're running on serverless platforms like Vercel, because each function invocation might hit a different instance. You need a shared store. Upstash Redis is the most popular choice for this in 2025.

npm install @upstash/ratelimit @upstash/redis
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(30, '60 s'),
  analytics: true,
  prefix: 'api-ratelimit',
});
// app/api/data/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { ratelimit } from '@/lib/rate-limit';

export async function GET(request: NextRequest) {
  const ip = request.headers.get('x-forwarded-for') ?? '127.0.0.1';
  const { success, limit, remaining, reset } = await ratelimit.limit(ip);

  if (!success) {
    const retryAfter = Math.ceil((reset - Date.now()) / 1000);
    return NextResponse.json(
      { error: 'Rate limit exceeded' },
      {
        status: 429,
        headers: {
          'Retry-After': String(retryAfter),
          'X-RateLimit-Limit': String(limit),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': String(reset),
        },
      }
    );
  }

  return NextResponse.json({ data: 'Success' }, {
    headers: {
      'X-RateLimit-Limit': String(limit),
      'X-RateLimit-Remaining': String(remaining),
    },
  });
}

Upstash's free tier gives you 10,000 requests/day, which is plenty for small projects. Their Pro plan starts at $10/month for 500K daily commands as of early 2025.

Middleware-Level Rate Limiting

If you want rate limiting across all your API routes, Next.js middleware is the place:

// middleware.ts
import { NextRequest, NextResponse } from 'next/server';
import { ratelimit } from '@/lib/rate-limit';

export async function middleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith('/api/')) {
    const ip = request.headers.get('x-forwarded-for') ?? '127.0.0.1';
    const { success, reset } = await ratelimit.limit(ip);

    if (!success) {
      return NextResponse.json(
        { error: 'Too many requests' },
        {
          status: 429,
          headers: {
            'Retry-After': String(Math.ceil((reset - Date.now()) / 1000)),
          },
        }
      );
    }
  }

  return NextResponse.next();
}

export const config = {
  matcher: '/api/:path*',
};

Rate Limiting Strategies Compared

Not all rate limiting algorithms are equal. Here's how the main ones compare:

Algorithm How It Works Pros Cons Best For
Fixed Window Counts requests in fixed time windows (e.g., per minute) Simple to implement Burst at window boundaries can allow 2x the limit Simple APIs, internal tools
Sliding Window Counts requests over a rolling time period Smoother distribution Slightly more complex, more memory Most production APIs
Token Bucket Tokens refill at a steady rate, each request costs a token Allows controlled bursts More complex state management APIs that need burst tolerance
Leaky Bucket Requests enter a queue and are processed at a fixed rate Very smooth output rate Can add latency, requests may be dropped Webhook delivery, job processing
Sliding Window Log Stores timestamp of each request Most accurate High memory usage at scale Low-volume, high-accuracy needs

For most web applications, sliding window is the sweet spot. It's what Upstash uses by default, and it's what I'd recommend unless you have a specific reason to choose something else.

Rate Limiting in Astro and Other Frameworks

If you're building with Astro, rate limiting works differently because Astro is primarily a static-first framework. But with Astro's server endpoints (available in SSR mode), the concepts are the same:

// src/pages/api/data.ts
import type { APIRoute } from 'astro';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(30, '60 s'),
});

export const GET: APIRoute = async ({ request }) => {
  const ip = request.headers.get('x-forwarded-for') ?? '127.0.0.1';
  const { success, reset } = await ratelimit.limit(ip);

  if (!success) {
    return new Response(JSON.stringify({ error: 'Rate limit exceeded' }), {
      status: 429,
      headers: {
        'Content-Type': 'application/json',
        'Retry-After': String(Math.ceil((reset - Date.now()) / 1000)),
      },
    });
  }

  return new Response(JSON.stringify({ data: 'Hello' }), {
    status: 200,
    headers: { 'Content-Type': 'application/json' },
  });
};

For edge-deployed applications on Cloudflare Workers, you might also consider Cloudflare's built-in Rate Limiting rules, which operate at the infrastructure level and can handle far more traffic than application-level solutions. Their Advanced Rate Limiting starts at $0.05 per 10,000 good requests on the Business plan.

Monitoring and Debugging 429 Errors in Production

You can't fix what you can't see. Here's my checklist for dealing with 429 errors in production:

When You're Receiving 429s

  1. Check which API is returning 429 — Look at the response URL, not just the status code
  2. Log the Retry-After header — If it's consistently very long, you may need a higher tier plan
  3. Audit your request patterns — Are you making redundant calls? Can you batch requests?
  4. Implement caching — Use stale-while-revalidate, Redis caching, or Next.js ISR to reduce API calls
  5. Check if multiple environments share API keys — This is the most common "mystery" 429 cause

When You're Sending 429s

  1. Set up dashboards — Track 429 response rates over time
  2. Identify top offenders — Which IP addresses or API keys are hitting limits most?
  3. Review your limits — Are they too restrictive? Too loose? Check your server capacity and adjust
  4. Always send Retry-After — Be a good API citizen
  5. Include a helpful error message — Tell the client which limit they hit and when to retry

A well-crafted 429 response body looks like this:

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "You've exceeded 30 requests per minute. Please wait before retrying.",
    "retryAfter": 42,
    "documentation": "https://docs.yourapi.com/rate-limits"
  }
}

This is infinitely more helpful than just { "error": "Too many requests" }.

If you're dealing with persistent rate limiting issues on a headless architecture — whether it's during builds, at runtime, or both — it might be worth getting in touch to discuss your architecture. We've seen a lot of these problems across different CMS and framework combinations and there's usually a pattern-level fix rather than just band-aiding the symptoms.

FAQ

What does HTTP 429 Too Many Requests mean?

HTTP 429 is a status code that means you've sent too many requests to a server within a given time period. The server is rate limiting you — it's asking you to slow down. It's not an authentication error or a server error; your requests are probably valid, there are just too many of them. The server should include a Retry-After header telling you when to try again.

How do I fix a 429 error?

If you're receiving 429 errors from an API, implement exponential backoff with jitter in your retry logic, reduce your request frequency, add caching to avoid redundant calls, and respect the Retry-After header. If you're hitting the limit during builds, use request queuing with concurrency limits. If it's happening consistently, you may need to upgrade to a higher API plan with more generous rate limits.

What is the Retry-After header?

The Retry-After header is sent with a 429 (or 503) response to tell the client how long to wait before making another request. It can be specified as a number of seconds (e.g., Retry-After: 60) or as an HTTP date (e.g., Retry-After: Thu, 01 Jan 2026 00:00:00 GMT). Not all APIs include this header, but the well-designed ones do.

How do I implement rate limiting in Next.js?

For development or single-server deployments, you can use an in-memory Map to track request counts per IP address. For production serverless deployments on platforms like Vercel, use Upstash Redis with the @upstash/ratelimit package. You can apply rate limiting at the individual route level or across all API routes using Next.js middleware.

What's the difference between 429 and 503 errors?

A 429 Too Many Requests is specifically about rate limiting — your client is sending too many requests. A 503 Service Unavailable means the server is overloaded or under maintenance and can't handle any requests from anyone. Both can include a Retry-After header, but they indicate very different problems. A 429 is targeted at you; a 503 affects everyone.

Can rate limiting prevent DDoS attacks?

Rate limiting is one layer of defense against DDoS attacks, but it's not sufficient on its own. Application-level rate limiting (like what you'd implement in Next.js) can handle moderate abuse, but a serious DDoS attack needs to be mitigated at the infrastructure level — using services like Cloudflare, AWS Shield, or your hosting provider's built-in protections. Think of app-level rate limiting as a bouncer, and infrastructure-level protection as the fortress walls.

What rate limit should I set for my API?

It depends entirely on your use case. A common starting point for public APIs is 60 requests per minute per IP, or 1,000 requests per hour per API key. For authenticated users, you might allow more. The key is to monitor actual usage patterns, set limits that accommodate legitimate use with some headroom, and adjust based on real data. Start more restrictive and loosen up — it's easier than tightening limits after users depend on higher rates.

Why am I getting 429 errors during my static site build?

Static site generators like Next.js and Astro fetch data for every page at build time. If you have hundreds or thousands of pages, that's hundreds or thousands of API calls in rapid succession. Most CMS APIs have rate limits between 5-20 requests per second. Use p-limit or similar libraries to cap concurrency at 3-5 simultaneous requests, add small delays between batches, and consider using incremental builds (ISR in Next.js, or Astro's incremental content collections) to avoid rebuilding everything at once.