Claude Code Agency Workflow: How We Run Projects in 2025
TL;DR
We run a headless web agency where Claude Code handles 60-70% of implementation work that used to need a full team. Our cost-per-MVP dropped from $35,000-$50,000 to $8,000-$15,000. Time-to-first-deploy went from 6-8 weeks to 10-18 days. But AI didn't replace everything--it replaced specific, well-scoped tasks. Here's what works, what doesn't, and what we still pay humans for.
Table of Contents
- Why We Rebuilt Our Agency Around Claude Code
- What Does a Claude Code Agency Workflow Actually Look Like?
- What AI Handles in Our Projects
- What We Still Hire Humans For
- Real Numbers: Cost-Per-MVP and Time-to-Deploy
- Our Claude Code Project Setup
- Is the One-Person Billion-Dollar Company Real?
- What Doesn't Work Yet
- How We Scope Client Projects Now
- The Founder Math: Hours Per Week Breakdown
- FAQ
Why We Rebuilt Our Agency Around Claude Code
We didn't plan this. Late 2024, we were a 4-person headless dev shop billing $150/hour for Next.js and headless CMS work. By March 2025, after integrating Claude Code--specifically Claude 3.5 Sonnet initially, now Claude 4 Sonnet--into every project, two of those roles had fundamentally changed. Not eliminated. Changed. One senior dev became a full-time AI-directed engineer. The other shifted entirely to code review and architecture.
The catalyst: a Sanity + Next.js 14 project where we used Claude Code to scaffold the entire schema layer, generate GROQ queries, build 14 page templates, and write the deployment pipeline. What would have been 120 billable hours came in at 34. We looked at each other and said: "We need to restructure everything."
That's the honest origin. Not a grand strategy. A project that finished too fast.
What Does a Claude Code Agency Workflow Actually Look Like?
Here's a typical week on an active client build:
Monday: Architecture + Kickoff
- Me: 2 hours defining component architecture, data model, API contracts
- Me: 1 hour writing CLAUDE.md project instructions (more on this below)
- Claude Code: generates initial project scaffold, installs dependencies, configures TypeScript strict mode, sets up linting
Tuesday-Thursday: Build Sprint
- Me: 1-2 hours per day reviewing Claude Code output, catching errors, redirecting
- Claude Code: 6-8 tasks per day--page components, API routes, CMS schema definitions, utility functions, test files
- Me: architecture pivots, complex state management decisions, client Slack threads
Friday: Integration + QA
- Me: 3-4 hours of manual QA, accessibility audit, performance testing
- Claude Code: fixing bugs identified in QA, writing missing tests, generating documentation
- Me: client demo prep, deployment to staging
Total human hours per week on an active build: 18-24. Down from 35-45 in our pre-AI workflow.
What AI Handles in Our Projects
Here's the specific task inventory--things Claude Code does on real client projects every week:
Code Generation (70-80% automated)
- React/Next.js components: Page layouts, UI components from Figma specs described in prompts, form handlers
- CMS schemas: Sanity schema types, Contentful content models as migration scripts, Payload CMS collection configs
- API routes: Next.js Route Handlers, tRPC procedures, webhook endpoints
- Database operations: Prisma schema changes, migration files, seed scripts
- TypeScript types: Generating types from API responses, Zod validation schemas, shared type packages
Code Audits (saves 4-6 hours/week)
- Reviewing existing codebases before refactor projects
- Identifying unused dependencies, dead code, type inconsistencies
- Generating audit reports with specific file:line references
Content Drafts (saves 3-5 hours/week)
- RFP responses and technical proposals
- Project documentation and README files
- Client-facing technical explanations
- SOW first drafts (always human-reviewed and rewritten)
Testing (saves 5-8 hours/week)
- Vitest unit tests for utility functions
- Playwright e2e test scaffolds
- Test data generation and fixtures
- Edge case identification we might miss
What We Still Hire Humans For
| Task | Why AI Can't Do It (Yet) | Who We Hire | Typical Cost |
|---|---|---|---|
| Brand strategy | Requires understanding client's market position, competitors, customer psychology at a level AI hallucinates on | Contract brand strategist | $3,000-$8,000/project |
| Copy direction | Tone, voice, and persuasion architecture need human judgment | Freelance copywriter | $2,000-$5,000/project |
| Sales calls | Clients want to talk to a person who understands their business | We do this ourselves | Our time |
| Visual design | Figma work, art direction, design systems | Contract designer | $4,000-$12,000/project |
| Complex DevOps | Kubernetes configs, multi-region deployments, CI/CD for regulated industries | Contract DevOps engineer | $150-$200/hour |
| Legal review | Contracts, MSAs, IP clauses | Attorney | $350-$500/hour |
| Accessibility audits | Automated tools catch 30-40% of issues; real screen reader testing needs a human | A11y specialist | $1,500-$3,000/audit |
| User research | Talking to actual users, synthesizing feedback | UX researcher | $100-$150/hour |
That's 8 categories where humans are non-negotiable.
Real Numbers: Cost-Per-MVP and Time-to-Deploy
Here are actual numbers from our last 6 client projects (Q1-Q2 2025), anonymized:
| Project | Stack | Legacy Estimate | AI-Assisted Actual | Time-to-Deploy |
|---|---|---|---|---|
| SaaS marketing site | Next.js 15 + Sanity v3 | $38,000 | $11,500 | 12 days |
| E-commerce storefront | Next.js 15 + Shopify Storefront API | $52,000 | $18,200 | 18 days |
| Portfolio/CMS for creative agency | Astro 5 + Payload CMS 3.0 | $28,000 | $8,400 | 10 days |
| SaaS dashboard MVP | Next.js 15 + Supabase + Prisma | $45,000 | $14,800 | 16 days |
| Nonprofit site redesign | Next.js 14 + Contentful | $32,000 | $9,200 | 11 days |
| Developer docs site | Astro 5 + MDX + Algolia | $22,000 | $7,600 | 8 days |
"Legacy estimate" is what we would have quoted in 2023 with our old team structure. "AI-assisted actual" is what the client paid in 2025.
Cost reduction: 62%. Time-to-first-deploy: 12.5 days.
These are all projects in our sweet spot--headless CMS sites and Next.js applications. Enterprise RBAC systems, real-time collaborative apps, or anything involving complex distributed systems would look different.
Our Claude Code Project Setup
Every project starts with a CLAUDE.md file in the repo root. This is the single most impactful thing we've done to improve AI output quality. Here's our template structure:
# Project: [Client Name]
## Tech Stack
- Framework: Next.js 15.1 (App Router)
- CMS: Sanity v3.72
- Styling: Tailwind CSS v4.0
- Language: TypeScript 5.7 (strict mode)
- Package manager: pnpm 9.x
- Node: 22 LTS
## Architecture Decisions
- All data fetching in Server Components
- Client components only for interactivity
- GROQ queries co-located with page components
- No barrel exports
- Prefer named exports
## Code Conventions
- Use `cn()` utility for conditional classes (already in lib/utils.ts)
- Error boundaries at route segment level
- All images through next/image with explicit dimensions
- Forms use react-hook-form + zod
## File Structure
[tree output of src/ directory]
## Known Constraints
- Client requires WCAG 2.2 AA
- Must support IE-- just kidding. Chrome 120+, Safari 17+, Firefox 121+
- Deploy target: Vercel (Pro plan, us-east-1)
## Do NOT
- Install new dependencies without asking
- Create files outside src/
- Use default exports (except for Next.js pages/layouts)
- Write CSS outside of Tailwind classes
This file eliminates roughly 40% of the "Claude went off the rails" incidents. Without it, you get generic code that doesn't match your project's patterns. With it, Claude Code generates components that look like your team wrote them.
We also use claude --dangerously-skip-permissions during scaffolding phases (never in production branches) and switch to the interactive approval mode once we're past initial setup. Cost per project in API usage: typically $40-$120 for a full build, running on Claude 4 Sonnet.
Is the One-Person Billion-Dollar Company Real?
No. But it's a thought experiment that reveals something real about where we are.
Evartology's piece on Substack--"How to Run a Company Alone in 2026"--lays out an impressive stack: AI for engineering, marketing, sales, operations, even hiring. It's a well-organized playbook, and I agree with about 60% of it. The parts about using AI for content drafts, code generation, and operational docs match our experience. But the piece underestimates the irreducibility of trust. Clients don't buy code. They buy confidence that someone understands their problem. That's a human thing.
Henry's piece (henrythe9th on Substack) about a solo founder who "cloned himself" with AI agents is more grounded. The specific example of using AI to handle customer support triage and first-draft responses resonates--we do something similar with technical proposal drafts. But the framing of "cloning" oversells it. What actually happened is task delegation to AI. You didn't clone your judgment. You offloaded your typing.
Nate's executive briefing on one-person businesses touches on the Carta data showing a growing percentage of solo-founder startups. That's real. Carta's data from early 2025 showed solo incorporations trending upward. But a solo-incorporated company on Carta isn't the same as a solo-operated company. Most of those founders hire contractors, agencies (like us), and fractional roles. They're solo on the cap table, not solo in practice.
Our take: the realistic version of this isn't one person doing a billion dollars. It's one person (or a very small team) doing $1M-$5M in revenue with 70-80% margins, handling the work that used to require 8-12 people. That's not a fantasy. We're watching it happen. But it requires AI competence, domain expertise, and an existing professional network. Not just a ChatGPT subscription.
What Doesn't Work Yet
1. Complex Multi-File Refactors
Claude Code can refactor a single file brilliantly. But when you need coordinated changes across 15+ files--say, changing a data model that touches API routes, components, types, tests, and CMS schemas simultaneously--it loses coherence around file 8-10. We've had it introduce breaking circular dependencies, forget to update imports in files it touched earlier in the session, and silently skip files. Our workaround: break refactors into 3-4 file batches and verify between each.
2. Design-to-Code from Figma
Despite the hype, generating production-quality components from Figma designs is still a 60% accuracy task at best. Claude Code (or any LLM) can't see your Figma file directly. You're describing layouts in words or pasting screenshots. The output gets the structure roughly right but misses spacing, responsive breakpoints, and interaction states. We still have a human translate designs to components, then use Claude Code to flesh out variants and states.
3. Performance Optimization
Claude Code will tell you to add React.memo() and call it a day. Real performance work--identifying unnecessary re-renders through React DevTools profiling, optimizing GROQ queries by analyzing Sanity's execution plans, reducing CLS by auditing third-party scripts--requires human observation of runtime behavior. AI can't profile your app.
4. Debugging Production Issues
When something breaks at 2 AM and the error is a cryptic Vercel Edge Runtime timeout, Claude Code can suggest possibilities. But it can't look at your Datadog dashboard, correlate the timing with a deploy, check if the CDN cache was purged, or realize that the issue is actually a DNS propagation delay from a domain transfer that happened 48 hours ago. Production debugging is context-heavy and AI context windows are still too narrow.
5. Anything Requiring Visual Judgment
Is this animation too fast? Does this color combination feel right for a luxury brand? Is the whitespace balanced? Claude Code has zero opinions here. Don't ask.
6. Long-Running Session Coherence
After about 45-60 minutes of continuous work in a single Claude Code session, we notice quality degradation. It starts repeating patterns from earlier in the session even when the context has changed. It forgets constraints from the CLAUDE.md. We restart sessions every 45 minutes as a rule. This is a real productivity tax--probably 20-30 minutes of re-orientation time per day.
How We Scope Client Projects Now
Our scoping process changed fundamentally. Here's the before and after:
Before (2023)
- Discovery call (1 hour)
- Internal architecture discussion (2 hours)
- Detailed SOW with hourly estimates per feature (4-6 hours)
- Client review cycle (1-2 weeks)
- Signed contract → kickoff
After (2025)
- Discovery call (45 minutes)
- Claude Code generates SOW first draft from call notes (15 minutes of prompting)
- I review and rewrite the SOW (1 hour)
- We build a throwaway proof-of-concept of the hardest technical challenge using Claude Code (2-3 hours)
- Scope is now based on actual implementation data, not guesses
- Client review (3-5 days)
- Signed contract → kickoff
Step 4 is the key difference. We used to estimate "Shopify Storefront API integration: 40 hours" based on experience. Now we actually build a rough version in 2-3 hours and know it's 22 hours with AI assistance. Our estimates are within 15% of actuals. They used to be within 30-40%.
This costs us 3-4 hours of unbilled pre-sales work per project. But our close rate went from ~35% to ~55% because clients see a working prototype before signing.
The Founder Math: Hours Per Week Breakdown
Here's how my week actually breaks down as an agency founder using Claude Code:
| Activity | Hours/Week | AI-Assisted? |
|---|---|---|
| Client calls and Slack | 6 | No |
| Architecture and technical decisions | 5 | Partially (Claude Code for research) |
| Code review of AI output | 8 | No |
| Directing Claude Code sessions | 6 | N/A (this IS the AI work) |
| Business ops (invoicing, contracts, planning) | 3 | Partially (drafts) |
| Sales and proposals | 3 | Partially (first drafts) |
| Manual QA and testing | 3 | No |
| Learning and staying current | 2 | No |
| Total | 36 |
36 hours a week. Not 80. Not 20. And that's running an agency doing $60K-$80K/month in revenue with 2 active client projects at any time.
Pre-AI, this same output required 3.5 FTEs and my 50-hour weeks. The math is real. But notice: 22 of those 36 hours are still entirely human work. AI didn't eliminate work. It changed the ratio of thinking-to-typing.
FAQ
How much does Claude Code cost per month for agency work?
We spend approximately $180-$300/month on Claude API usage for Claude Code across all projects. This is on the Claude 4 Sonnet model. Individual project costs range from $40-$120 depending on scope and session count.
Can Claude Code replace a junior developer?
It replaces the output of a junior developer but not the role. Someone still needs to direct, review, and correct the AI's work. That someone needs senior-level judgment. AI-generated code without expert review ships bugs faster.
What's the best CMS to pair with a Claude Code workflow?
Sanity v3, because its schema definitions are TypeScript files that Claude Code generates exceptionally well. Payload CMS 3.0 is a close second. Contentful works but its management API is more complex for AI to work with reliably.
Does Claude Code work for mobile app development?
We've used it for React Native (Expo SDK 52) projects with decent results for component generation and navigation setup. It struggles more with native module configuration and platform-specific debugging. Roughly 40-50% productivity gain vs. 60-70% for web projects.
How do you handle client IP concerns with AI-generated code?
Our MSA includes a clause stating all deliverables are original work product regardless of tooling used. Anthropic's terms (as of June 2025) grant users rights to outputs. We don't send client proprietary data to the API--only code patterns and generic implementations.
What happens when Claude Code generates incorrect code?
It happens on roughly 15-20% of tasks. Our workflow accounts for this with mandatory human code review on every PR. Common failure modes: incorrect TypeScript generics, stale API patterns from training data, and missing error handling for edge cases. We budget review time into every estimate.