TL;DR

We run a headless web agency where Claude Code handles 60-70% of implementation work that used to need a full team. Our cost-per-MVP dropped from $35,000-$50,000 to $8,000-$15,000. Time-to-first-deploy went from 6-8 weeks to 10-18 days. But AI didn't replace everything--it replaced specific, well-scoped tasks. Here's what works, what doesn't, and what we still pay humans for.

Table of Contents

Why We Rebuilt Our Agency Around Claude Code

We didn't plan this. Late 2024, we were a 4-person headless dev shop billing $150/hour for Next.js and headless CMS work. By March 2025, after integrating Claude Code--specifically Claude 3.5 Sonnet initially, now Claude 4 Sonnet--into every project, two of those roles had fundamentally changed. Not eliminated. Changed. One senior dev became a full-time AI-directed engineer. The other shifted entirely to code review and architecture.

The catalyst: a Sanity + Next.js 14 project where we used Claude Code to scaffold the entire schema layer, generate GROQ queries, build 14 page templates, and write the deployment pipeline. What would have been 120 billable hours came in at 34. We looked at each other and said: "We need to restructure everything."

That's the honest origin. Not a grand strategy. A project that finished too fast.

What Does a Claude Code Agency Workflow Actually Look Like?

Here's a typical week on an active client build:

Monday: Architecture + Kickoff

  • Me: 2 hours defining component architecture, data model, API contracts
  • Me: 1 hour writing CLAUDE.md project instructions (more on this below)
  • Claude Code: generates initial project scaffold, installs dependencies, configures TypeScript strict mode, sets up linting

Tuesday-Thursday: Build Sprint

  • Me: 1-2 hours per day reviewing Claude Code output, catching errors, redirecting
  • Claude Code: 6-8 tasks per day--page components, API routes, CMS schema definitions, utility functions, test files
  • Me: architecture pivots, complex state management decisions, client Slack threads

Friday: Integration + QA

  • Me: 3-4 hours of manual QA, accessibility audit, performance testing
  • Claude Code: fixing bugs identified in QA, writing missing tests, generating documentation
  • Me: client demo prep, deployment to staging

Total human hours per week on an active build: 18-24. Down from 35-45 in our pre-AI workflow.

What AI Handles in Our Projects

Here's the specific task inventory--things Claude Code does on real client projects every week:

Code Generation (70-80% automated)

  • React/Next.js components: Page layouts, UI components from Figma specs described in prompts, form handlers
  • CMS schemas: Sanity schema types, Contentful content models as migration scripts, Payload CMS collection configs
  • API routes: Next.js Route Handlers, tRPC procedures, webhook endpoints
  • Database operations: Prisma schema changes, migration files, seed scripts
  • TypeScript types: Generating types from API responses, Zod validation schemas, shared type packages

Code Audits (saves 4-6 hours/week)

  • Reviewing existing codebases before refactor projects
  • Identifying unused dependencies, dead code, type inconsistencies
  • Generating audit reports with specific file:line references

Content Drafts (saves 3-5 hours/week)

  • RFP responses and technical proposals
  • Project documentation and README files
  • Client-facing technical explanations
  • SOW first drafts (always human-reviewed and rewritten)

Testing (saves 5-8 hours/week)

  • Vitest unit tests for utility functions
  • Playwright e2e test scaffolds
  • Test data generation and fixtures
  • Edge case identification we might miss

What We Still Hire Humans For

Task Why AI Can't Do It (Yet) Who We Hire Typical Cost
Brand strategy Requires understanding client's market position, competitors, customer psychology at a level AI hallucinates on Contract brand strategist $3,000-$8,000/project
Copy direction Tone, voice, and persuasion architecture need human judgment Freelance copywriter $2,000-$5,000/project
Sales calls Clients want to talk to a person who understands their business We do this ourselves Our time
Visual design Figma work, art direction, design systems Contract designer $4,000-$12,000/project
Complex DevOps Kubernetes configs, multi-region deployments, CI/CD for regulated industries Contract DevOps engineer $150-$200/hour
Legal review Contracts, MSAs, IP clauses Attorney $350-$500/hour
Accessibility audits Automated tools catch 30-40% of issues; real screen reader testing needs a human A11y specialist $1,500-$3,000/audit
User research Talking to actual users, synthesizing feedback UX researcher $100-$150/hour

That's 8 categories where humans are non-negotiable.

Real Numbers: Cost-Per-MVP and Time-to-Deploy

Here are actual numbers from our last 6 client projects (Q1-Q2 2025), anonymized:

Project Stack Legacy Estimate AI-Assisted Actual Time-to-Deploy
SaaS marketing site Next.js 15 + Sanity v3 $38,000 $11,500 12 days
E-commerce storefront Next.js 15 + Shopify Storefront API $52,000 $18,200 18 days
Portfolio/CMS for creative agency Astro 5 + Payload CMS 3.0 $28,000 $8,400 10 days
SaaS dashboard MVP Next.js 15 + Supabase + Prisma $45,000 $14,800 16 days
Nonprofit site redesign Next.js 14 + Contentful $32,000 $9,200 11 days
Developer docs site Astro 5 + MDX + Algolia $22,000 $7,600 8 days

"Legacy estimate" is what we would have quoted in 2023 with our old team structure. "AI-assisted actual" is what the client paid in 2025.

Cost reduction: 62%. Time-to-first-deploy: 12.5 days.

These are all projects in our sweet spot--headless CMS sites and Next.js applications. Enterprise RBAC systems, real-time collaborative apps, or anything involving complex distributed systems would look different.

Our Claude Code Project Setup

Every project starts with a CLAUDE.md file in the repo root. This is the single most impactful thing we've done to improve AI output quality. Here's our template structure:

# Project: [Client Name]

## Tech Stack
- Framework: Next.js 15.1 (App Router)
- CMS: Sanity v3.72
- Styling: Tailwind CSS v4.0
- Language: TypeScript 5.7 (strict mode)
- Package manager: pnpm 9.x
- Node: 22 LTS

## Architecture Decisions
- All data fetching in Server Components
- Client components only for interactivity
- GROQ queries co-located with page components
- No barrel exports
- Prefer named exports

## Code Conventions
- Use `cn()` utility for conditional classes (already in lib/utils.ts)
- Error boundaries at route segment level
- All images through next/image with explicit dimensions
- Forms use react-hook-form + zod

## File Structure
[tree output of src/ directory]

## Known Constraints
- Client requires WCAG 2.2 AA
- Must support IE-- just kidding. Chrome 120+, Safari 17+, Firefox 121+
- Deploy target: Vercel (Pro plan, us-east-1)

## Do NOT
- Install new dependencies without asking
- Create files outside src/
- Use default exports (except for Next.js pages/layouts)
- Write CSS outside of Tailwind classes

This file eliminates roughly 40% of the "Claude went off the rails" incidents. Without it, you get generic code that doesn't match your project's patterns. With it, Claude Code generates components that look like your team wrote them.

We also use claude --dangerously-skip-permissions during scaffolding phases (never in production branches) and switch to the interactive approval mode once we're past initial setup. Cost per project in API usage: typically $40-$120 for a full build, running on Claude 4 Sonnet.

Is the One-Person Billion-Dollar Company Real?

No. But it's a thought experiment that reveals something real about where we are.

Evartology's piece on Substack--"How to Run a Company Alone in 2026"--lays out an impressive stack: AI for engineering, marketing, sales, operations, even hiring. It's a well-organized playbook, and I agree with about 60% of it. The parts about using AI for content drafts, code generation, and operational docs match our experience. But the piece underestimates the irreducibility of trust. Clients don't buy code. They buy confidence that someone understands their problem. That's a human thing.

Henry's piece (henrythe9th on Substack) about a solo founder who "cloned himself" with AI agents is more grounded. The specific example of using AI to handle customer support triage and first-draft responses resonates--we do something similar with technical proposal drafts. But the framing of "cloning" oversells it. What actually happened is task delegation to AI. You didn't clone your judgment. You offloaded your typing.

Nate's executive briefing on one-person businesses touches on the Carta data showing a growing percentage of solo-founder startups. That's real. Carta's data from early 2025 showed solo incorporations trending upward. But a solo-incorporated company on Carta isn't the same as a solo-operated company. Most of those founders hire contractors, agencies (like us), and fractional roles. They're solo on the cap table, not solo in practice.

Our take: the realistic version of this isn't one person doing a billion dollars. It's one person (or a very small team) doing $1M-$5M in revenue with 70-80% margins, handling the work that used to require 8-12 people. That's not a fantasy. We're watching it happen. But it requires AI competence, domain expertise, and an existing professional network. Not just a ChatGPT subscription.

What Doesn't Work Yet

1. Complex Multi-File Refactors

Claude Code can refactor a single file brilliantly. But when you need coordinated changes across 15+ files--say, changing a data model that touches API routes, components, types, tests, and CMS schemas simultaneously--it loses coherence around file 8-10. We've had it introduce breaking circular dependencies, forget to update imports in files it touched earlier in the session, and silently skip files. Our workaround: break refactors into 3-4 file batches and verify between each.

2. Design-to-Code from Figma

Despite the hype, generating production-quality components from Figma designs is still a 60% accuracy task at best. Claude Code (or any LLM) can't see your Figma file directly. You're describing layouts in words or pasting screenshots. The output gets the structure roughly right but misses spacing, responsive breakpoints, and interaction states. We still have a human translate designs to components, then use Claude Code to flesh out variants and states.

3. Performance Optimization

Claude Code will tell you to add React.memo() and call it a day. Real performance work--identifying unnecessary re-renders through React DevTools profiling, optimizing GROQ queries by analyzing Sanity's execution plans, reducing CLS by auditing third-party scripts--requires human observation of runtime behavior. AI can't profile your app.

4. Debugging Production Issues

When something breaks at 2 AM and the error is a cryptic Vercel Edge Runtime timeout, Claude Code can suggest possibilities. But it can't look at your Datadog dashboard, correlate the timing with a deploy, check if the CDN cache was purged, or realize that the issue is actually a DNS propagation delay from a domain transfer that happened 48 hours ago. Production debugging is context-heavy and AI context windows are still too narrow.

5. Anything Requiring Visual Judgment

Is this animation too fast? Does this color combination feel right for a luxury brand? Is the whitespace balanced? Claude Code has zero opinions here. Don't ask.

6. Long-Running Session Coherence

After about 45-60 minutes of continuous work in a single Claude Code session, we notice quality degradation. It starts repeating patterns from earlier in the session even when the context has changed. It forgets constraints from the CLAUDE.md. We restart sessions every 45 minutes as a rule. This is a real productivity tax--probably 20-30 minutes of re-orientation time per day.

How We Scope Client Projects Now

Our scoping process changed fundamentally. Here's the before and after:

Before (2023)

  1. Discovery call (1 hour)
  2. Internal architecture discussion (2 hours)
  3. Detailed SOW with hourly estimates per feature (4-6 hours)
  4. Client review cycle (1-2 weeks)
  5. Signed contract → kickoff

After (2025)

  1. Discovery call (45 minutes)
  2. Claude Code generates SOW first draft from call notes (15 minutes of prompting)
  3. I review and rewrite the SOW (1 hour)
  4. We build a throwaway proof-of-concept of the hardest technical challenge using Claude Code (2-3 hours)
  5. Scope is now based on actual implementation data, not guesses
  6. Client review (3-5 days)
  7. Signed contract → kickoff

Step 4 is the key difference. We used to estimate "Shopify Storefront API integration: 40 hours" based on experience. Now we actually build a rough version in 2-3 hours and know it's 22 hours with AI assistance. Our estimates are within 15% of actuals. They used to be within 30-40%.

This costs us 3-4 hours of unbilled pre-sales work per project. But our close rate went from ~35% to ~55% because clients see a working prototype before signing.

The Founder Math: Hours Per Week Breakdown

Here's how my week actually breaks down as an agency founder using Claude Code:

Activity Hours/Week AI-Assisted?
Client calls and Slack 6 No
Architecture and technical decisions 5 Partially (Claude Code for research)
Code review of AI output 8 No
Directing Claude Code sessions 6 N/A (this IS the AI work)
Business ops (invoicing, contracts, planning) 3 Partially (drafts)
Sales and proposals 3 Partially (first drafts)
Manual QA and testing 3 No
Learning and staying current 2 No
Total 36

36 hours a week. Not 80. Not 20. And that's running an agency doing $60K-$80K/month in revenue with 2 active client projects at any time.

Pre-AI, this same output required 3.5 FTEs and my 50-hour weeks. The math is real. But notice: 22 of those 36 hours are still entirely human work. AI didn't eliminate work. It changed the ratio of thinking-to-typing.

FAQ

How much does Claude Code cost per month for agency work?

We spend approximately $180-$300/month on Claude API usage for Claude Code across all projects. This is on the Claude 4 Sonnet model. Individual project costs range from $40-$120 depending on scope and session count.

Can Claude Code replace a junior developer?

It replaces the output of a junior developer but not the role. Someone still needs to direct, review, and correct the AI's work. That someone needs senior-level judgment. AI-generated code without expert review ships bugs faster.

What's the best CMS to pair with a Claude Code workflow?

Sanity v3, because its schema definitions are TypeScript files that Claude Code generates exceptionally well. Payload CMS 3.0 is a close second. Contentful works but its management API is more complex for AI to work with reliably.

Does Claude Code work for mobile app development?

We've used it for React Native (Expo SDK 52) projects with decent results for component generation and navigation setup. It struggles more with native module configuration and platform-specific debugging. Roughly 40-50% productivity gain vs. 60-70% for web projects.

How do you handle client IP concerns with AI-generated code?

Our MSA includes a clause stating all deliverables are original work product regardless of tooling used. Anthropic's terms (as of June 2025) grant users rights to outputs. We don't send client proprietary data to the API--only code patterns and generic implementations.

What happens when Claude Code generates incorrect code?

It happens on roughly 15-20% of tasks. Our workflow accounts for this with mandatory human code review on every PR. Common failure modes: incorrect TypeScript generics, stale API patterns from training data, and missing error handling for edge cases. We budget review time into every estimate.