If you've been paying attention to the AI tooling space in 2025, you've probably noticed that Model Context Protocol (MCP) has gone from "interesting spec from Anthropic" to "thing every serious SaaS product needs to support." And for good reason. MCP gives AI agents -- Claude, GPT-based assistants, custom agents -- a standardized way to discover and call your API. Think of it as OpenAPI for the agentic era, but with real-time bidirectional communication baked in.

I've spent the last few months building MCP servers for several SaaS products, and I want to share what actually works, what the docs don't tell you, and the architectural decisions that matter. This isn't a rehash of the spec. This is the guide I wish I'd had when I started.

Table of Contents

What Is the Model Context Protocol (MCP)?

MCP is an open protocol originally created by Anthropic that defines how AI models communicate with external tools and data sources. Released in late 2024 and reaching v1.0 stability in early 2025, it's now supported by Claude Desktop, Cursor, Windsurf, the OpenAI Agents SDK, and dozens of other AI clients.

The core idea: instead of every AI integration being a custom plugin with a proprietary format, MCP provides a single protocol that any compliant client can use to discover what your service offers and interact with it.

Here's what MCP defines:

  • Tools -- Functions the AI can call (like create_ticket, search_users, generate_report)
  • Resources -- Data the AI can read (like documentation, database records, config files)
  • Prompts -- Reusable prompt templates your server can expose
  • Sampling -- The ability for your server to request LLM completions from the client

The transport layer uses JSON-RPC 2.0 over either stdio (for local servers) or HTTP with Server-Sent Events (SSE) for remote servers. The newer Streamable HTTP transport, introduced in the 2025-03 spec revision, is what you'll want for any production SaaS deployment.

Why Your SaaS Product Needs an MCP Server

Let me be direct: if your SaaS product has an API, you should be building an MCP server right now. Here's why.

AI agents are becoming the primary API consumers. In Q1 2025, Anthropic reported that Claude Desktop users invoke MCP tools over 2 million times per day. That number is growing fast. Your customers are already trying to use AI agents to interact with your product -- the question is whether you make that easy or frustrating.

It's a distribution channel. When someone installs Claude Desktop and types "help me manage my projects," the AI can discover and use MCP servers the user has configured. Your product becomes accessible through natural language. That's not a gimmick -- it's a genuine new surface area for your product.

Your competitors are doing it. Stripe, Linear, Notion, GitHub, Sentry, and Supabase all shipped MCP servers in the first half of 2025. If you're in B2B SaaS and you don't have one, you're falling behind.

Factor REST API Only REST API + MCP Server
AI agent accessibility Requires custom integration per agent Any MCP client works automatically
Discovery Developers read docs AI discovers capabilities at runtime
Time to first integration Hours to days Minutes
Natural language interaction Not possible without wrapper Built-in
Maintenance burden One codebase Two codebases (but MCP wraps REST)

MCP Architecture: How It Actually Works

Before we write code, let's get the architecture straight. An MCP deployment has three parts:

  1. MCP Client -- The AI application (Claude Desktop, Cursor, your custom agent). It discovers your server's capabilities and calls them.
  2. MCP Server -- Your code. It exposes tools, resources, and prompts via the MCP protocol. This is what we're building.
  3. Your SaaS API -- The actual backend your MCP server calls to get things done.

The flow looks like this:

User → AI Client → MCP Client → MCP Server → Your SaaS API
                                      ↓
                              Response flows back

Your MCP server is essentially a protocol adapter. It translates between the MCP protocol (which AI clients speak) and your existing REST/GraphQL API. This means you don't need to modify your existing API at all. The MCP server sits alongside it.

Transport Options

For SaaS products, you have two realistic transport options:

  • Streamable HTTP (recommended): Uses standard HTTP requests with optional SSE for streaming. Works behind load balancers, CDNs, and standard infrastructure. This is what you want for remote/hosted MCP servers.
  • SSE (legacy): The original remote transport. Still works but the spec recommends Streamable HTTP for new implementations.

Stdio transport is great for local tools but isn't applicable for a SaaS product's MCP server.

Setting Up Your MCP Server Project

Let's build this with TypeScript. The official @modelcontextprotocol/sdk package is well-maintained and is the right choice for production use. Python has mcp (the official Python SDK) if that's your stack, but I'll focus on TypeScript since most SaaS backends I work with use Node.js.

mkdir my-saas-mcp-server
cd my-saas-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod express
npm install -D typescript @types/node @types/express tsx

Set up your tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  },
  "include": ["src/**/*"]
}

Now let's create the basic server structure:

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";

const app = express();
app.use(express.json());

const server = new McpServer({
  name: "my-saas-mcp",
  version: "1.0.0",
  description: "MCP server for My SaaS Product",
});

// We'll add tools, resources, and prompts here

app.post("/mcp", async (req, res) => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  });
  
  await server.connect(transport);
  await transport.handleRequest(req, res);
});

app.get("/mcp", async (req, res) => {
  // SSE endpoint for streaming responses
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  });
  
  await server.connect(transport);
  await transport.handleRequest(req, res);
});

const PORT = process.env.PORT || 3001;
app.listen(PORT, () => {
  console.log(`MCP server running on port ${PORT}`);
});

That's your skeleton. Let's fill it in.

Defining Tools, Resources, and Prompts

Tools

Tools are the most important primitive. Each tool is a function the AI can call with structured parameters. Here's how to define one:

import { z } from "zod";

server.tool(
  "create_project",
  "Create a new project in the workspace",
  {
    name: z.string().describe("The project name"),
    description: z.string().optional().describe("Project description"),
    team_id: z.string().describe("ID of the team that owns this project"),
  },
  async ({ name, description, team_id }) => {
    // Call your SaaS API here
    const project = await apiClient.createProject({ name, description, team_id });
    
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(project, null, 2),
        },
      ],
    };
  }
);

A few things I've learned about tool design:

Be specific with descriptions. The AI uses your tool description and parameter descriptions to decide when and how to call it. Vague descriptions lead to wrong tool selection. "Create a new project" is okay. "Create a new project in the user's workspace. Requires a team_id which can be obtained from the list_teams tool" is much better.

Keep tool count manageable. I've seen people expose 50+ tools and wonder why the AI gets confused. Start with 10-15 core operations. You can always add more.

Return structured data. Always return JSON in your text content. The AI parses it better than prose.

Resources

Resources expose readable data. Think of them as GET endpoints for AI consumption:

server.resource(
  "project-list",
  "projects://list",
  "List of all projects in the workspace",
  async () => {
    const projects = await apiClient.listProjects();
    return {
      contents: [
        {
          uri: "projects://list",
          mimeType: "application/json",
          text: JSON.stringify(projects, null, 2),
        },
      ],
    };
  }
);

Prompts

Prompts are reusable templates. They're underutilized but powerful:

server.prompt(
  "weekly-report",
  "Generate a weekly status report for a project",
  {
    project_id: z.string().describe("The project ID to report on"),
  },
  async ({ project_id }) => {
    const stats = await apiClient.getProjectStats(project_id);
    return {
      messages: [
        {
          role: "user",
          content: {
            type: "text",
            text: `Generate a weekly status report based on this data: ${JSON.stringify(stats)}. Include completed tasks, blockers, and upcoming deadlines.`,
          },
        },
      ],
    };
  }
);

Connecting to Your SaaS API

Your MCP server needs to talk to your existing API. I recommend creating a typed API client class:

// src/api-client.ts
class SaaSApiClient {
  private baseUrl: string;
  private apiKey: string;

  constructor(baseUrl: string, apiKey: string) {
    this.baseUrl = baseUrl;
    this.apiKey = apiKey;
  }

  private async request<T>(path: string, options?: RequestInit): Promise<T> {
    const response = await fetch(`${this.baseUrl}${path}`, {
      ...options,
      headers: {
        "Authorization": `Bearer ${this.apiKey}`,
        "Content-Type": "application/json",
        ...options?.headers,
      },
    });

    if (!response.ok) {
      const error = await response.text();
      throw new Error(`API error ${response.status}: ${error}`);
    }

    return response.json() as T;
  }

  async createProject(data: CreateProjectInput): Promise<Project> {
    return this.request<Project>("/api/v1/projects", {
      method: "POST",
      body: JSON.stringify(data),
    });
  }

  async listProjects(): Promise<Project[]> {
    return this.request<Project[]>("/api/v1/projects");
  }

  // ... more methods
}

Keep this client thin. It's a pass-through, not a business logic layer.

Authentication and Multi-Tenancy

This is where things get interesting -- and where most tutorials gloss over the hard parts.

Your MCP server needs to authenticate requests in two ways:

  1. The MCP client authenticating to your MCP server ("is this a valid user?")
  2. Your MCP server authenticating to your SaaS API ("act on behalf of this user")

The MCP spec includes an authorization framework based on OAuth 2.1. For a SaaS product, this is the right approach:

// Middleware to extract and validate the OAuth token
app.use("/mcp", async (req, res, next) => {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith("Bearer ")) {
    return res.status(401).json({ error: "Missing authorization" });
  }

  const token = authHeader.slice(7);
  
  try {
    const user = await validateOAuthToken(token);
    req.user = user;
    next();
  } catch {
    return res.status(401).json({ error: "Invalid token" });
  }
});

Then pass the user context into your tool handlers. This is critical for multi-tenancy -- every API call should be scoped to the authenticated user's workspace.

API Key Approach (Simpler, for Internal/Early Stage)

If you're early stage or this is internal-only, API keys work fine:

const apiKey = req.headers["x-api-key"] as string;
const client = new SaaSApiClient(process.env.API_BASE_URL!, apiKey);

The user provides their API key when configuring the MCP server in their client, and it gets passed through to your API.

Error Handling and Validation

AI agents need clear error messages. When a tool fails, the AI needs to understand why so it can either fix the input or explain the issue to the user.

server.tool(
  "get_project",
  "Get project details by ID",
  { project_id: z.string().uuid().describe("The project's UUID") },
  async ({ project_id }) => {
    try {
      const project = await apiClient.getProject(project_id);
      return {
        content: [{ type: "text", text: JSON.stringify(project, null, 2) }],
      };
    } catch (error) {
      const message = error instanceof Error ? error.message : "Unknown error";
      return {
        isError: true,
        content: [
          {
            type: "text",
            text: `Failed to get project: ${message}. Make sure the project_id is a valid UUID and the project exists in your workspace.`,
          },
        ],
      };
    }
  }
);

Notice the isError: true flag. This tells the AI client that the tool call failed, so it can handle it appropriately. Always include actionable guidance in error messages.

Testing Your MCP Server

Testing MCP servers has gotten a lot easier in 2025. Here are your options:

MCP Inspector

The official MCP Inspector is your best friend during development:

npx @modelcontextprotocol/inspector

This gives you a web UI where you can connect to your server, browse tools/resources, and invoke them interactively. Use it constantly.

Automated Testing

For CI/CD, test your tools as regular async functions:

import { describe, it, expect } from "vitest";

describe("create_project tool", () => {
  it("should create a project with valid input", async () => {
    const result = await createProjectHandler({
      name: "Test Project",
      team_id: "team-123",
    });

    expect(result.isError).toBeUndefined();
    const data = JSON.parse(result.content[0].text);
    expect(data.name).toBe("Test Project");
  });

  it("should return error for missing team_id", async () => {
    // Zod validation should catch this before the handler runs
    // Test the validation layer
  });
});

Integration Testing with Claude Desktop

Once your server is running, add it to Claude Desktop's config:

{
  "mcpServers": {
    "my-saas": {
      "url": "http://localhost:3001/mcp",
      "headers": {
        "Authorization": "Bearer your-test-token"
      }
    }
  }
}

Then just talk to Claude and try to use your tools naturally. You'll quickly find edge cases the automated tests miss.

Deployment and Production Considerations

Where to Deploy

Your MCP server is just an Express app. Deploy it wherever you deploy Node.js services. Some good options:

Platform Cold Start Cost (est.) Best For
Railway None ~$5-20/mo Small-medium SaaS
Fly.io <500ms ~$5-15/mo Global distribution
AWS ECS/Fargate None ~$15-50/mo Enterprise, existing AWS
Vercel (Edge) <100ms $0-20/mo If you're already on Vercel
Cloudflare Workers <5ms $0-5/mo Performance-critical

Rate Limiting

AI agents can be chatty. A single user conversation might trigger 20-30 tool calls. Implement rate limiting that's generous enough for normal AI usage but prevents abuse:

import rateLimit from "express-rate-limit";

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 60, // 60 requests per minute per user
  keyGenerator: (req) => req.user?.id || req.ip,
});

app.use("/mcp", limiter);

Monitoring

Log every tool invocation with the user ID, tool name, and latency. You want visibility into which tools are used most, which fail most, and where the latency bottlenecks are. Datadog, Axiom, or even structured JSON logs to CloudWatch work fine.

Versioning

Your MCP server will evolve. Use the version field in your server metadata and consider running multiple versions behind a path prefix (/mcp/v1, /mcp/v2) during transitions.

Real-World Example: Building an MCP Server for a Project Management SaaS

Let me walk through a real example. Say you're building an MCP server for a project management tool (think a simplified Linear or Asana).

Here's the tool set I'd expose:

// Core CRUD tools
server.tool("list_projects", ...);
server.tool("get_project", ...);
server.tool("create_project", ...);
server.tool("update_project", ...);

// Task management
server.tool("list_tasks", ...);  // with filters for status, assignee, project
server.tool("create_task", ...);
server.tool("update_task", ...); // update status, assignee, priority
server.tool("add_comment", ...);

// Search and reporting
server.tool("search", ...);      // full-text search across projects and tasks
server.tool("get_project_stats", ...); // summary stats for a project

// Resources
server.resource("workspace-info", ...); // workspace config, team members

// Prompts
server.prompt("standup-report", ...);   // generate a standup from recent activity
server.prompt("sprint-planning", ...);  // help plan a sprint

That's 12 tools, 1 resource, and 2 prompts. Enough to be genuinely useful without overwhelming the AI's tool selection.

The user experience looks like this: someone opens Claude Desktop and says "What tasks are overdue in the Backend Rewrite project?" Claude calls list_tasks with a status filter and project name, gets the results, and presents them in natural language. The user says "Assign the auth migration task to Sarah and bump it to high priority." Claude calls update_task. It feels magical, and it's really just protocol plumbing.

If you're building something like this and want help with the Next.js frontend or the headless CMS layer that often accompanies these projects, that's something we do a lot at Social Animal. But the MCP server itself? That's something you can absolutely build in-house with this guide.

FAQ

What's the difference between MCP and function calling?

Function calling (like OpenAI's function calling or Claude's tool use) is how an AI model decides to invoke a function within a single API call. MCP is the protocol that lets an AI client discover what functions are available from external servers. They work together -- the AI client uses function calling internally to decide when to invoke an MCP tool. Think of MCP as the plumbing between systems, and function calling as the model's decision-making process.

How much does it cost to build and run an MCP server?

The server itself is lightweight. For a typical SaaS product with 10-20 tools, you're looking at a few hundred lines of TypeScript. Hosting costs $5-50/month depending on your traffic and platform. The real cost is developer time -- budget 2-4 weeks for a production-quality MCP server with auth, error handling, monitoring, and tests. If that feels like a lot, we've helped teams ship these faster. Check our pricing page for details.

Can I use Python instead of TypeScript?

Absolutely. The official Python SDK (pip install mcp) is excellent and arguably has better ergonomics for tool definitions. Use whatever your team knows. The protocol is language-agnostic. If your SaaS backend is Python (Django, FastAPI), building the MCP server in Python makes even more sense since you can share models and validation logic.

Do I need to modify my existing API?

No. Your MCP server is a separate service that calls your existing API. It's an adapter layer. That said, you might find yourself wanting to add a few API endpoints specifically for AI consumption -- like a search endpoint that returns more context than your UI needs. That's fine. But it's additive, not a modification.

How do I handle long-running operations?

Some tools might trigger operations that take minutes (like generating a report or processing a large import). Use MCP's progress notification feature to keep the client informed. Your tool handler can emit progress updates while waiting for the operation to complete. For very long operations (>30 seconds), consider returning immediately with a job ID and providing a separate check_job_status tool.

Is MCP stable enough for production?

Yes, as of mid-2025. The spec hit v1.0 in March 2025, and the 2025-03-26 revision (which added Streamable HTTP) is what major clients have adopted. Anthropic, Microsoft, Google, and OpenAI are all investing in the protocol. It's not going away. That said, keep an eye on spec updates -- there are active proposals around better auth flows and server-to-server communication.

What's the best way to handle pagination in MCP tools?

Return a reasonable page of results (20-50 items) by default, and accept cursor or page parameters. Include pagination metadata in your response so the AI knows there are more results. Something like: { results: [...], next_cursor: "abc123", total_count: 342 }. The AI will naturally ask for the next page if the user needs more data.

Can one MCP server support multiple AI clients simultaneously?

Yes, and it should. Your MCP server is just an HTTP server handling concurrent requests. Each request includes the user's auth token, so you scope everything to the right tenant. There's no client-specific state to worry about if you're using the Streamable HTTP transport correctly. Treat it like any other stateless API server.