MCP Server Development: Deploy on Vercel with Next.js for SaaS
If you've been building SaaS products in 2025-2026, you've probably noticed that every product manager now wants "AI features." Fair enough. But the real question isn't whether to add AI -- it's how to give AI models structured, secure access to your application's data and capabilities. That's exactly what the Model Context Protocol (MCP) solves, and deploying MCP servers on Vercel with Next.js has become one of the most practical patterns I've seen for SaaS teams that want to move fast without spinning up new infrastructure.
I've spent the last several months building MCP servers for clients -- some simple tool-serving setups, others with complex multi-tenant auth flows. This article covers everything I've learned about building, deploying, and scaling MCP servers on Vercel with Next.js.

Table of Contents
- What Is MCP and Why SaaS Teams Should Care
- Architecture Overview: MCP on Vercel
- Setting Up Your Next.js MCP Server
- Implementing MCP Tools and Resources
- Authentication and Multi-Tenancy
- Deploying to Vercel: Configuration and Gotchas
- Performance Optimization and Scaling
- Monitoring and Observability
- Cost Analysis: Running MCP Servers on Vercel
- FAQ
What Is MCP and Why SaaS Teams Should Care
The Model Context Protocol (MCP) is an open standard -- originally developed by Anthropic and now widely adopted -- that defines how AI models interact with external tools and data sources. Think of it as a USB-C port for AI: a standardized interface that any AI client can use to connect to any MCP-compatible server.
Before MCP, if you wanted Claude, GPT, or any other model to interact with your SaaS app, you'd build custom integrations for each AI provider. Function calling with OpenAI looked different from tool use with Anthropic. MCP changes that. You build one server, and any MCP-compatible client can use it.
For SaaS teams, this matters because:
- Your users expect AI integrations. By mid-2026, roughly 68% of B2B SaaS users report using AI assistants alongside their primary tools (Gartner, Q1 2026).
- MCP is becoming the default. Claude Desktop, Cursor, Windsurf, VS Code Copilot, and dozens of other clients now support MCP natively.
- Building an MCP server is cheaper than building custom integrations for every AI provider.
MCP vs. Traditional API Integrations
| Aspect | Traditional API | MCP Server |
|---|---|---|
| Client compatibility | One-to-one per provider | Any MCP-compatible client |
| Discovery | Manual docs reading | Automatic tool/resource discovery |
| Auth flow | Custom per integration | Standardized OAuth 2.1 / API keys |
| Maintenance burden | High (N integrations) | Low (1 server) |
| Real-time data | Polling or webhooks | Server-sent events / streaming |
| Setup time | Days to weeks per client | Hours for the server, minutes per client |
Architecture Overview: MCP on Vercel
Here's the architecture I've landed on after iterating through several approaches:
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
│ MCP Clients │ │ Vercel (Next.js) │ │ Your SaaS │
│ │ │ │ │ Backend │
│ - Claude │────▶│ /api/mcp (HTTP+SSE) │────▶│ - Database │
│ - Cursor │ │ /api/mcp/sse │ │ - APIs │
│ - Custom Apps │◀────│ /api/auth/[...mcp] │ │ - Services │
└─────────────────┘ └──────────────────────┘ └─────────────────┘
The key insight: your MCP server doesn't replace your existing API. It sits in front of it as a translation layer. The MCP server exposes tools and resources that map to your existing SaaS functionality, but in a format AI models can discover and use.
On Vercel, this runs as serverless functions. The latest MCP spec (v2025-12) supports HTTP with Server-Sent Events (SSE) as a transport, which works well with Vercel's streaming support in Next.js route handlers.
Why Next.js on Vercel?
You could build an MCP server with any framework -- Express, Fastify, Hono, whatever. But Next.js on Vercel gives you some real advantages for SaaS:
- Your marketing site, app, and MCP server live in one repo. Less infrastructure to manage.
- Edge middleware handles auth before requests hit your MCP endpoints.
- Vercel's streaming support works well with SSE-based MCP transport.
- Automatic scaling -- you don't think about servers.
- If you're already running Next.js (and statistically, you probably are), there's zero new infrastructure.
We do a lot of Next.js development at Social Animal, and this pattern has become one of our most requested architectures.

Setting Up Your Next.js MCP Server
Let's build this. I'm assuming you're on Next.js 15+ with the App Router.
Installing Dependencies
pnpm add @modelcontextprotocol/sdk zod
pnpm add -D @types/node
The @modelcontextprotocol/sdk package (v1.12+ as of early 2026) includes everything you need for HTTP+SSE transport. Earlier versions only supported stdio, which doesn't work on serverless.
Creating the MCP Route Handler
// app/api/mcp/route.ts
import { McpServer } from '@modelcontextprotocol/sdk/server';
import { httpTransport } from '@modelcontextprotocol/sdk/server/http';
import { z } from 'zod';
const server = new McpServer({
name: 'your-saas-mcp',
version: '1.0.0',
description: 'MCP server for YourSaaS platform',
});
// Register tools (we'll flesh these out next)
server.tool(
'get-projects',
'List all projects for the authenticated user',
{
status: z.enum(['active', 'archived', 'all']).optional().default('active'),
limit: z.number().min(1).max(100).optional().default(20),
},
async ({ status, limit }, context) => {
// Your actual business logic here
const projects = await fetchProjects(context.auth.userId, { status, limit });
return {
content: [
{
type: 'text',
text: JSON.stringify(projects, null, 2),
},
],
};
}
);
const handler = httpTransport(server, {
sessionManagement: true,
cors: {
origin: '*', // Lock this down in production
},
});
export const GET = handler;
export const POST = handler;
export const DELETE = handler;
SSE Endpoint for Streaming
Some MCP clients prefer the SSE transport for long-running operations:
// app/api/mcp/sse/route.ts
import { sseTransport } from '@modelcontextprotocol/sdk/server/sse';
import { server } from '../mcp-server'; // Extract server config to shared module
export const GET = sseTransport(server, {
// Vercel has a 30-second timeout on Hobby, 300s on Pro
// For long-running tools, you'll need Pro plan minimum
keepAliveInterval: 15000,
});
Implementing MCP Tools and Resources
This is where the real work happens. MCP distinguishes between tools (actions the AI can take) and resources (data the AI can read). Getting this right makes the difference between an MCP server that AI clients love and one they struggle with.
Designing Good Tools
The biggest mistake I see: tools that are too granular or too broad. If you expose 50 tiny tools, AI models get overwhelmed. If you expose 3 mega-tools that each take 20 parameters, models make mistakes.
My rule of thumb: one tool per user intention. If a user would say "show me my recent invoices," that's one tool. Don't split it into list-invoices + filter-invoices + format-invoices.
// Good: clear intent, reasonable parameters
server.tool(
'search-customers',
'Search for customers by name, email, or account ID. Returns matching customer profiles with recent activity.',
{
query: z.string().describe('Search term - can be name, email, or account ID'),
includeInactive: z.boolean().optional().default(false),
},
async ({ query, includeInactive }, context) => {
const customers = await customerService.search({
query,
tenantId: context.auth.tenantId,
includeInactive,
});
return {
content: [{
type: 'text',
text: JSON.stringify(customers.map(c => ({
id: c.id,
name: c.name,
email: c.email,
plan: c.plan,
mrr: c.mrr,
lastActive: c.lastActiveAt,
})), null, 2),
}],
};
}
);
Exposing Resources
Resources are read-only data that AI clients can pull in as context. Think of them as files the model can reference:
server.resource(
'api-docs',
'Your SaaS API documentation',
'text/markdown',
async () => {
const docs = await fs.readFile('./docs/api-reference.md', 'utf-8');
return { content: docs };
}
);
// Dynamic resource with URI template
server.resourceTemplate(
'project/{projectId}/analytics',
'Analytics summary for a specific project',
'application/json',
async ({ projectId }, context) => {
const analytics = await analyticsService.getSummary(projectId, context.auth.tenantId);
return { content: JSON.stringify(analytics) };
}
);
Authentication and Multi-Tenancy
This is the part everyone gets wrong on the first try. MCP supports OAuth 2.1 for authentication, and if you're building a multi-tenant SaaS, you absolutely need this.
OAuth 2.1 Flow for MCP
// middleware.ts
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
export function middleware(request: NextRequest) {
if (request.nextUrl.pathname.startsWith('/api/mcp')) {
const authHeader = request.headers.get('authorization');
if (!authHeader?.startsWith('Bearer ')) {
return NextResponse.json(
{ error: 'Missing or invalid authorization header' },
{ status: 401 }
);
}
// Validate the token and inject tenant context
// This is where your existing auth system plugs in
}
}
export const config = {
matcher: '/api/mcp/:path*',
};
For the OAuth discovery endpoint that MCP clients need:
// app/.well-known/oauth-authorization-server/route.ts
export function GET() {
return Response.json({
issuer: 'https://your-saas.com',
authorization_endpoint: 'https://your-saas.com/oauth/authorize',
token_endpoint: 'https://your-saas.com/api/oauth/token',
registration_endpoint: 'https://your-saas.com/api/oauth/register',
scopes_supported: ['mcp:read', 'mcp:write', 'mcp:admin'],
response_types_supported: ['code'],
code_challenge_methods_supported: ['S256'],
});
}
Multi-Tenant Isolation
Every MCP tool invocation must be scoped to the authenticated tenant. I use a pattern where the auth context is injected into every tool handler automatically:
const withTenant = (handler) => async (params, context) => {
const tenant = await resolveTenant(context.auth.token);
if (!tenant) throw new McpError('Invalid tenant');
return handler(params, { ...context, tenant });
};
Never trust tool parameters for tenant identification. Always derive it from the auth token.
Deploying to Vercel: Configuration and Gotchas
vercel.json Configuration
{
"functions": {
"app/api/mcp/route.ts": {
"maxDuration": 60
},
"app/api/mcp/sse/route.ts": {
"maxDuration": 300
}
},
"headers": [
{
"source": "/api/mcp/(.*)",
"headers": [
{ "key": "Cache-Control", "value": "no-store" }
]
}
]
}
The Gotchas Nobody Tells You
1. Function timeouts. Vercel Hobby plan maxes at 30 seconds. Pro gets you 300 seconds. For MCP tools that call slow APIs or process data, you need Pro minimum. At $20/month per team member, it's not a big deal for most SaaS teams.
2. Cold starts. Serverless cold starts can add 200-800ms to the first request. MCP clients generally handle this fine -- they're not expecting sub-50ms responses. But if it bothers you, use Vercel's cron to keep functions warm.
3. SSE and streaming. Vercel supports streaming responses, but there are edge cases with their CDN layer. Set Cache-Control: no-store on all MCP routes. I learned this the hard way when cached SSE responses caused clients to receive stale tool lists.
4. Request body size. Vercel limits request bodies to 4.5MB on serverless functions. If your MCP tools handle file uploads or large payloads, you'll need to use signed upload URLs instead.
5. Environment variables. Don't forget to set your MCP server's public URL as an env var. During development, you'll use something like ngrok or Vercel's preview URLs, but in production, it needs to be your canonical domain.
# .env.production
MCP_SERVER_URL=https://your-saas.com/api/mcp
MCP_SERVER_NAME=your-saas-mcp
Performance Optimization and Scaling
Caching Strategies
MCP tool responses can be cached when the data doesn't change frequently:
import { unstable_cache } from 'next/cache';
const getCachedAnalytics = unstable_cache(
async (tenantId: string, projectId: string) => {
return analyticsService.getSummary(tenantId, projectId);
},
['analytics-summary'],
{ revalidate: 300 } // 5 minutes
);
Connection Pooling
If your MCP tools hit a database, use connection pooling. On Vercel, each function invocation gets its own execution context, so without pooling, you'll exhaust database connections fast.
import { Pool } from '@neondatabase/serverless';
// Neon's serverless driver handles pooling automatically
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
I'd recommend Neon or PlanetScale for database-backed MCP tools on Vercel. Both handle the serverless connection model well.
Benchmarks
Here's what we've measured across several production MCP deployments on Vercel Pro:
| Metric | Cold Start | Warm | P99 |
|---|---|---|---|
| Simple tool (no DB) | 420ms | 45ms | 180ms |
| DB-backed tool (Neon) | 680ms | 95ms | 320ms |
| Tool with external API | 850ms | 280ms | 1200ms |
| SSE connection setup | 520ms | 60ms | 250ms |
| Tool discovery (list) | 380ms | 30ms | 120ms |
These numbers are fine for AI client interactions. Models take seconds to generate responses anyway -- your MCP server won't be the bottleneck.
Monitoring and Observability
You need to know what's happening in production. MCP servers have unique observability needs because the "users" are AI models, not humans.
What to Track
- Tool invocation frequency -- which tools are models actually using?
- Error rates per tool -- is a specific tool failing more than others?
- Token/tenant distribution -- is one tenant hammering your server?
- Response payload sizes -- oversized responses waste model context windows
// Simple logging middleware for MCP tools
const withLogging = (toolName: string, handler: Function) => {
return async (params: any, context: any) => {
const start = performance.now();
try {
const result = await handler(params, context);
const duration = performance.now() - start;
console.log(JSON.stringify({
type: 'mcp_tool_invocation',
tool: toolName,
tenant: context.auth?.tenantId,
duration,
success: true,
responseSize: JSON.stringify(result).length,
}));
return result;
} catch (error) {
console.error(JSON.stringify({
type: 'mcp_tool_error',
tool: toolName,
tenant: context.auth?.tenantId,
error: error.message,
}));
throw error;
}
};
};
Pipe these logs to Axiom (Vercel's integrated logging), Datadog, or whatever you're already using. Vercel's built-in Log Drains make this straightforward.
Cost Analysis: Running MCP Servers on Vercel
Let's talk money. Here's a realistic cost breakdown for a mid-size SaaS running an MCP server on Vercel in 2026:
| Component | Hobby | Pro | Enterprise |
|---|---|---|---|
| Base plan | $0/mo | $20/mo per seat | Custom |
| Function invocations (included) | 100K | 1M | Custom |
| Additional invocations | N/A | $0.60 per 1M | Negotiable |
| Bandwidth (included) | 100GB | 1TB | Custom |
| Max function duration | 30s | 300s | 900s |
| Edge middleware | Included | Included | Included |
| Estimated monthly (10K MCP requests/day) | Not viable | ~$25-40 | Custom |
For most SaaS products, the Pro plan handles MCP traffic comfortably. At 10,000 MCP tool invocations per day (which is quite active), you're looking at ~300K function executions per month -- well within Pro's included allocation.
Compare this to running a dedicated MCP server on AWS: you'd need at minimum an EC2 instance ($30-50/mo), load balancer ($18/mo), and your time managing infrastructure. Vercel wins on operational simplicity.
If you're evaluating the right architecture for your SaaS, we can help scope this out. Check our pricing page or reach out directly.
FAQ
What is the Model Context Protocol (MCP) and how does it differ from function calling?
MCP is an open standard for connecting AI models to external tools and data. Unlike provider-specific function calling (OpenAI's function calling, Anthropic's tool use), MCP is universal. You build one MCP server, and any compatible client -- Claude, Cursor, custom apps -- can discover and use your tools automatically. Function calling requires you to define tools separately for each AI provider.
Can I deploy an MCP server on Vercel's free Hobby plan?
Technically yes, but I wouldn't recommend it for production. The 30-second function timeout is too restrictive for MCP tools that query databases or call external APIs. You also get limited invocations (100K/month). The Pro plan at $20/month per seat is the minimum I'd suggest for any real workload.
How do I handle authentication between MCP clients and my SaaS?
The MCP spec supports OAuth 2.1. You expose a .well-known/oauth-authorization-server endpoint that MCP clients discover automatically. When a user connects via an AI client like Claude, they're redirected to your standard OAuth flow, grant permissions, and the client receives a scoped access token. This token is sent with every MCP request.
What's the difference between MCP tools and MCP resources?
Tools are actions -- things the AI can do (create a project, send an email, run a query). Resources are data -- things the AI can read for context (documentation, configuration files, analytics summaries). Tools are invoked on demand; resources are loaded into the model's context window. Design tools for actions, resources for reference material.
How many MCP tools should my server expose?
From my experience, 5-15 tools is the sweet spot for most SaaS products. Fewer than 5 and your MCP server isn't very useful. More than 20 and AI models start making poor tool selection decisions. Group related operations into single tools with parameter options rather than exposing every CRUD operation separately.
Does this work with frameworks other than Next.js?
Absolutely. The @modelcontextprotocol/sdk works with any Node.js framework. You could use Hono, Express, or even Astro with SSR endpoints. Next.js on Vercel is just a particularly convenient combination because of the built-in streaming support, edge middleware, and zero-config deployment. If you're using a different stack, our headless CMS development team has built MCP servers across multiple frameworks.
How do I test my MCP server during development?
The MCP Inspector (part of the official MCP toolkit) is your best friend. It connects to your local server and lets you invoke tools, browse resources, and debug responses interactively. For automated testing, write integration tests that instantiate your MCP server in-process and call tools programmatically -- the SDK supports this without needing HTTP transport.
What happens when Vercel functions cold start during an MCP request?
MCP clients are designed to be tolerant of latency -- they're typically waiting for AI model responses that take seconds anyway. A 400-800ms cold start is imperceptible in practice. If you're concerned, Vercel's Pro plan lets you configure cron-based warming, and the SDK includes automatic retry logic for transient failures. In six months of production usage, cold starts have never been a user-reported issue for our clients.