We ship headless sites for clients who measure everything -- Core Web Vitals, schema markup, accessibility scores, organic traffic deltas. One bad deploy can tank rankings that took months to build. So when Anthropic rolled out subagents, hooks, and skills in Claude Code, we rebuilt our entire pre-deploy pipeline around them.

This post walks through our exact setup: the .claude/ directory structure, each subagent definition, the hook configs, and the skill files that tie it all together. We'll share four real incidents the system caught, the ROI math at our scale, and where this approach still has gaps.

We hit a 40% CLS spike that made it to production anyway

Three weeks back. Client's blog redesign. Everything looked fine in dev. We merged.

Two days later their CTO sends a screenshot from Search Console. CLS jumped from 0.08 to 0.14 on mobile. Pages that ranked #3 for "enterprise billing software" dropped to #8. Revenue impact? They estimated $40k/month.

The problem? A hero image that loaded async but had no size attributes. Classic. Our CI caught nothing because we weren't checking layout shift on the actual preview build.

That's when we started looking at subagents.

Subagents are scoped Claude Code instances that run inside a parent session with their own system prompt, tool access, and task boundary. Hooks trigger subagents at specific points -- before a command runs, after file changes, or on commit. Skills are reusable instruction files (markdown) that teach Claude how to perform a specific task.

Anthropic shipped the redesign on April 14, 2025, introducing Routines alongside these primitives. For our use case, raw subagents plus hooks gave us finer control over exactly when each check fires and what context it receives.

The key difference from traditional CI checks: subagents can reason about results, correlate failures across checks, and write human-readable summaries. A CI job returns exit code 0 or 1. A subagent returns "The structured data on /blog/[slug] is missing the dateModified field, which was present in the previous build. This will likely cause a rich snippet regression in Google Search Console within 3-5 days."

That's the whole point.

Three months in, our GitHub Actions setup finally broke us

Our previous pipeline was a tangle of GitHub Actions calling Lighthouse CI, pa11y, linkinator, and custom Node scripts. It worked.

Sort of.

But it had three problems.

No reasoning between checks. If Lighthouse flagged a CLS issue and the accessibility scan flagged a missing alt tag on the same image, we got two separate alerts with no connection. Engineers had to manually grep through CI logs, correlate timestamps, figure out it was the same component.

Waste of time.

Brittle config. Each tool had its own config file, threshold format, and output schema. Updating thresholds meant touching 4-6 files. YAML here. JSON there. Environment variables in a third place. One typo and the whole pipeline exits 0 when it should fail.

No contextual explanations. Engineers got pass/fail. Junior devs spent 20-40 minutes understanding why something failed and what to do about it. "Accessibility score: 87" doesn't tell you which ARIA attribute is missing or why it matters for screen readers.

We'd spend 3 hours a week debugging false positives or explaining failures in Slack.

The final straw? August 2025. We pushed a Northwind Traders redesign at 4pm on a Friday (I know). Lighthouse passed. Accessibility passed. Links passed. We shipped.

Monday morning their VP of Marketing emails us. "Why are our product pages missing from Google?" Turns out we'd accidentally set robots meta to noindex on every page under /products/. Our CI didn't check robots tags. Took six days to get re-indexed. They lost an estimated $12k in revenue.

We didn't need another CI tool -- we needed an orchestration layer that could reason about the outputs of tools we already trusted. Well-written skill files are the difference between a subagent that hallucinates accessibility rules and one that runs pa11y with the right flags and interprets the JSON output correctly.

Our .claude/ directory structure

Here's the actual tree:

.claude/
├── settings.json
├── agents/
│   ├── seo-regression.md
│   ├── cwv-smoke.md
│   ├── accessibility.md
│   ├── broken-links.md
│   ├── schema-validation.md
│   └── deploy-gate.md
├── skills/
│   ├── run-lighthouse.md
│   ├── run-pa11y.md
│   ├── run-linkinator.md
│   ├── parse-schema-org.md
│   ├── compare-seo-snapshot.md
│   └── format-deploy-report.md
└── snapshots/
    └── seo-baseline.json

The snapshots/ directory holds baseline data for comparison checks. Simple. We version it in Git so we can see what changed when a client asks "why did rankings drop last Tuesday?"

Nothing fancy. Just markdown files and JSON.

A client called at 11pm because Google dropped all their rich snippets

September 2025. We're building an e-commerce site for a mid-sized retailer (let's call them Acme Home Goods). They'd spent six months getting rich snippets -- product stars, pricing, availability -- showing up in search results.

We push a Shopify theme update. Looks fine. Ships Friday night.

Saturday at 11:14pm I get a text. "Our product pages look broken in Google. Stars are gone. Prices are gone. What happened?"

I open Search Console. Every single product page is throwing structured data errors. The offers field is missing priceCurrency. Without it, Google won't show the rich snippet. Rankings didn't drop, but click-through rate went from 4.2% to 1.8% overnight.

Cost? About $8k/week in lost traffic until we fixed it and Google re-crawled everything.

The schema was there. We just changed the property name from priceCurrency to currency because the Shopify API uses that key. Didn't think about it. No validation caught it.

That's when we built the schema-validation subagent.

You create a markdown file in .claude/agents/ with a system prompt, a list of allowed tools, and task instructions. The parent session (or a hook) spawns it with dispatch_agent() or via the hook config in settings.json.

Minimal structure:

# Agent: [Name]

## Role
[One-line description]

## Allowed Tools
- Bash (restricted to specific commands)
- Read file
- Write file

## Instructions
[Step-by-step task description, referencing skill files]

## Output Format
[Exact format the parent expects]

Be extremely specific about output format. If the deploy-gate orchestrator expects JSON with a passed boolean and a summary string, spell that out. Subagents that return free-form text break orchestration. We learned this the hard way when a subagent returned markdown tables and the deploy gate couldn't parse them. Took me two hours at 2am to debug because the parent just silently failed. No error. Just didn't trigger the deploy block.

Don't make my mistake. Lock down the format.

Subagent 1: SEO regression check

This compares the current build's SEO-critical elements against a baseline snapshot.

# Agent: SEO Regression Check

## Role
Detect SEO regressions between the current build and the stored baseline.

## Allowed Tools
- Bash (node scripts only)
- Read file

## Instructions
1. Read the skill file at .claude/skills/compare-seo-snapshot.md
2. Run: node scripts/extract-seo-meta.js --url=$PREVIEW_URL --output=/tmp/seo-current.json
3. Read .claude/snapshots/seo-baseline.json
4. Compare the two snapshots field by field:
   - title tags (exact match)
   - meta descriptions (similarity > 0.85)
   - canonical URLs (exact match)
   - h1 count (must equal 1 per page)
   - robots meta (must not have changed to noindex)
   - Open Graph tags (og:title, og:description, og:image present)
5. Flag any page where robots changed to noindex as CRITICAL.
6. Flag missing or duplicate title tags as HIGH.
7. Flag meta description changes > 15% different as MEDIUM.

## Output Format
{"passed": boolean, "critical": [], "high": [], "medium": [], "summary": string}

The extract-seo-meta.js script is 120 lines of Puppeteer that hits every page in the sitemap and dumps title, meta, canonicals, h1s, and OG tags to JSON. Nothing smart. Just extraction.

The subagent's value is in the comparison and reasoning, not the extraction. It knows which changes matter. Which ones are cosmetic. Which ones will cost the client $15k in organic traffic next quarter.

Example: if you change a meta description from "Best CRM software for small businesses in 2025" to "Best CRM software for small business", the similarity score is 0.91. That's fine. But if it changes to "CRM software", similarity drops to 0.65. The subagent flags it as MEDIUM because that's a 40% reduction in keyword density and will probably hurt CTR.

It's not just diff. It's reasoning about what the diff means.

We've caught four issues with this so far. The robots noindex thing. A case where someone deleted all the OG images (would've tanked social shares). A case where title tags got truncated to 40 chars instead of 60 (just looked bad, didn't hurt SEO, but client would've noticed). And one where canonical URLs changed from https:// to http:// (would've caused duplicate content penalties).

Each one would've cost us at least a few hours of cleanup and client trust. Probably more.

We store seo-baseline.json in the repo and update it as part of the deploy-success hook.

Subagent 2: Core Web Vitals smoke test

# Agent: CWV Smoke Test

## Role
Run Lighthouse on key pages and flag CWV regressions.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-lighthouse.md
2. Run Lighthouse CI against $PREVIEW_URL for these pages:
   - / (homepage)
   - /blog/ (listing)
   - /blog/[most-recent-post] (detail)
   - /services/ (if exists)
3. Thresholds (fail if any below):
   - LCP: 2500ms
   - FID/INP: 200ms
   - CLS: 0.1
   - Performance score: 85
   - Accessibility score: 90
4. If a metric regressed by more than 10% from the previous run,
   flag as WARNING even if still above threshold.
5. Include the specific element causing LCP or CLS where Lighthouse reports it.

## Output Format
{"passed": boolean, "pages": [{"url": string, "scores": {}, "flags": []}], "summary": string}

The associated skill file (run-lighthouse.md) contains the exact lhci CLI invocation:

# Skill: Run Lighthouse

## Command
```bash
npx @lhci/cli@0.14.0 collect \
  --url="$1" \
  --numberOfRuns=3 \
  --settings.preset=desktop \
  --settings.output=json \
  --settings.outputPath=/tmp/lhci-results/

Parsing

Read the median run from /tmp/lhci-results/. Extract:

  • categories.performance.score * 100
  • audits['largest-contentful-paint'].numericValue
  • audits['cumulative-layout-shift'].numericValue
  • audits['interaction-to-next-paint'].numericValue (if present)

## Subagent 3: Accessibility scan

```markdown
# Agent: Accessibility Scan

## Role
Run pa11y against preview URLs and report WCAG 2.1 AA violations.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-pa11y.md
2. Run pa11y against the same page set as the CWV agent.
3. Group results by severity: error, warning, notice.
4. For each error, include:
   - The WCAG criterion violated (e.g., 1.1.1 Non-text Content)
   - The HTML element (selector)
   - A one-sentence fix suggestion
5. Fail if any errors exist. Warn if warnings > 10.

## Output Format
{"passed": boolean, "error_count": number, "warning_count": number, "errors": [{"criterion": string, "selector": string, "fix": string}], "summary": string}

We use pa11y@8.0.0 with the --runner=axe flag. The default htmlcs runner misses some color contrast issues that axe catches.

# Agent: Broken Link Scan

## Role
Crawl the preview site and report broken internal and external links.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-linkinator.md
2. Run: npx linkinator@6.1.2 $PREVIEW_URL --recurse --timeout 15000 --format json > /tmp/link-results.json
3. Filter results to status >= 400 or status === 0 (timeout).
4. Separate internal (same domain) from external broken links.
5. Internal broken links are CRITICAL. External broken links are WARNING.
6. Exclude known-flaky external domains: twitter.com, linkedin.com (they block crawlers).

## Output Format
{"passed": boolean, "internal_broken": [{"source": string, "target": string, "status": number}], "external_broken": [...], "summary": string}

Subagent 5: Schema validation

# Agent: Schema Validation

## Role
Validate JSON-LD structured data on all pages.

## Allowed Tools
- Bash
- Read file

## Instructions
1. Read .claude/skills/parse-schema-org.md
2. For each page in the sitemap:
   a. Extract all <script type="application/ld+json"> blocks
   b. Parse as JSON (fail if malformed)
   c. Validate required fields per @type:
      - Article: headline, datePublished, dateModified, author, image
      - LocalBusiness: name, address, telephone
      - WebPage: name, description
      - BreadcrumbList: itemListElement with position, name, item
   d. Check that all @id references resolve within the page's graph
   e. Validate URLs in schema are absolute, not relative
3. Flag missing required fields as HIGH.
4. Flag malformed JSON as CRITICAL.

## Output Format
{"passed": boolean, "pages": [{"url": string, "schemas": [{"type": string, "valid": boolean, "issues": []}]}], "summary": string}

Subagent 6: Deploy gate orchestrator

This parent agent spawns the other five and makes the go/no-go call.

# Agent: Deploy Gate

## Role
Orchestrate all pre-deploy checks and produce a final deploy decision.

## Allowed Tools
- Bash
- Read file
- Write file
- dispatch_agent

## Instructions
1. Spawn these agents in parallel:
   - .claude/agents/seo-regression.md
   - .claude/agents/cwv-smoke.md
   - .claude/agents/accessibility.md
   - .claude/agents/broken-links.md
   - .claude/agents/schema-validation.md
2. Collect all outputs.
3. Read .claude/skills/format-deploy-report.md
4. Decision logic:
   - If ANY agent has a CRITICAL flag: BLOCK deploy.
   - If 2+ agents have HIGH flags: BLOCK deploy.
   - If 1 agent has HIGH flags: WARN, require manual override.
   - Otherwise: APPROVE.
5. Write the full report to /tmp/deploy-report.md
6. Output the decision.

## Output Format
{"decision": "APPROVE" | "WARN" | "BLOCK", "reports": {agent_name: agent_output}, "summary": string}

Hook configuration: settings.json

Here's our actual settings.json (with client-specific URLs redacted):

{
  "hooks": {
    "pre-commit": [
      {
        "agent": ".claude/agents/schema-validation.md",
        "condition": "files_changed_match('**/*.json', '**/structured-data/**')",
        "env": {
          "PREVIEW_URL": "http://localhost:3000"
        }
      }
    ],
    "pre-push": [
      {
        "agent": ".claude/agents/deploy-gate.md",
        "env": {
          "PREVIEW_URL": "$VERCEL_PREVIEW_URL"
        },
        "timeout": 300,
        "on_failure": "block"
      }
    ],
    "post-deploy-success": [
      {
        "command": "node scripts/extract-seo-meta.js --url=$PRODUCTION_URL --output=.claude/snapshots/seo-baseline.json",
        "description": "Update SEO baseline after successful deploy"
      }
    ]
  },
  "agent_defaults": {
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 8192,
    "timeout": 120
  },
  "skills_directory": ".claude/skills/"
}

Notes on this config:

  • We use claude-sonnet-4-20250514 for subagents, not Opus. The reasoning tasks here don't justify the cost difference. Sonnet handles "compare two JSON objects and flag differences" fine.
  • The timeout: 300 on the deploy gate gives all five subagents time to run. Individual agents have 120s defaults. The orchestrator gets 5 minutes because it waits on all of them.
  • The condition on the pre-commit hook means schema validation only runs when you touch schema-related files. No point running it on a CSS change.
  • post-deploy-success updates the baseline. Without this, your SEO regression check compares against stale data.

Skill definitions that glue it together

The skill file that does the most work is compare-seo-snapshot.md:

# Skill: Compare SEO Snapshots

## Purpose
Compare two SEO metadata snapshots and identify regressions.

## Input
- Current snapshot: /tmp/seo-current.json
- Baseline snapshot: .claude/snapshots/seo-baseline.json

## Comparison Rules

### Title Tags
- If a title changed AND the page's organic traffic (from baseline metadata) > 1000 sessions/month, flag as HIGH.
- If a title is now empty or matches another page's title, flag as CRITICAL.
- If a title changed on a low-traffic page, flag as MEDIUM.

### Canonical URLs
- Any change to canonical URL is HIGH.
- A canonical pointing to a different domain is CRITICAL.
- A missing canonical (was present, now gone) is HIGH.

### Robots Meta
- Any page that gained "noindex" is CRITICAL.
- Any page that gained "nofollow" on internal links is HIGH.

### New Pages
- Pages in current but not in baseline are INFO (expected for new content).
- But verify they have: title, meta description, canonical, at least one h1.

### Removed Pages
- Pages in baseline but not in current are HIGH.
- These might indicate accidental route removal.

This skill file encodes months of SEO incident response into a format Claude can reliably follow. Without it, the subagent would make reasonable but inconsistent judgments about what constitutes a regression.

Four incidents the system caught

Incident 1: Accidental noindex on 47 blog posts

Client: B2B SaaS company, 200 pages, 60k organic sessions/month.

A developer updated the <Head> component in the blog template to add a new meta tag. They copy-pasted from the staging config, which had <meta name="robots" content="noindex, nofollow"> hardcoded. The change passed code review because the reviewer focused on the new tag, not the existing ones.

The SEO regression subagent flagged 47 pages as CRITICAL -- robots meta changed to noindex. The deploy was blocked.

Time to detect: 2 minutes 14 seconds after push. Without the system, it would've been caught when Search Console showed a coverage drop 3-7 days later.

Estimated impact avoided: Those 47 posts drove roughly $14,000/month in pipeline. Even a one-week deindex event could've cost $3,500+.

Incident 2: CLS regression from a new hero image

Client: E-commerce brand, Next.js 14 storefront on Shopify Hydrogen.

The design team swapped the homepage hero to a new image with different aspect ratio but didn't update the width/height attributes on the <Image> component. The image loaded fine but caused a CLS of 0.34 -- well above the 0.1 threshold.

The CWV smoke test subagent reported CLS regression on the homepage. The summary specifically called out: "CLS caused by element img.hero-banner shifting 0.34 cumulative. The image dimensions (1920x800) don't match the container aspect ratio (16:9 = 1920x1080). Add explicit width={1920} height={800} or update the container."

Time to detect: 1 minute 47 seconds.

Client: Professional services firm, 80 pages.

We restructured their service pages from /services/[name] to /[category]/[name]. Redirects were in place, but three blog posts had hardcoded links to the old URLs, and the CMS-driven navigation had a cached entry pointing to a deleted page.

The broken link scan found 4 internal 404s. The subagent's summary noted that 3 of the 4 were in blog post body content (not navigation), which meant they'd been missed by the redirect audit.

Time to detect: 3 minutes 8 seconds. The linkinator crawl is the slowest part.

Incident 4: Missing dateModified in Article schema

Client: Media company, 2,000 articles.

A CMS migration from WordPress to Sanity lost the dateModified field mapping. The schema generation code fell back to null for dateModified, which produced invalid JSON-LD.

The schema validation subagent flagged every article page as HIGH -- missing required dateModified field. The summary explained: "Google requires dateModified for Article structured data to be eligible for Top Stories and rich results. All 2,147 article pages are affected."

Time to detect: 4 minutes 22 seconds (large sitemap).

ROI: minutes saved per ship and dollars per month

Here's our math:

Metric Before (CI + manual) After (subagents) Delta
Checks per deploy 4 tools, manual review 5 agents, automated +1 check, -100% manual review
Time to run all checks 8-12 min (sequential CI) 3-5 min (parallel subagents) -60%
Time to understand failures 20-40 min per failure 1-2 min (contextual summary) -90%
Deploys per week (all clients) 18 18 Same
False positive rate ~15% (noisy Lighthouse) ~4% (reasoning filters noise) -73%

Minutes saved per ship: Average 25 minutes when a check fails (30% of deploys). That's 25 × 5.4 failing deploys/week = 135 minutes/week = 9 hours/month.

Cost of the system:

  • Claude API costs for subagents: ~$0.12 per full deploy gate run (5 agents, Sonnet, 6,000 tokens average per agent)
  • 18 deploys/week × 4.3 weeks × $0.12 = $9.29/month in API costs
  • Puppeteer/Lighthouse infrastructure: runs on existing Vercel build instances, no added cost
  • Maintenance time: ~2 hours/month updating skill files and thresholds

Dollar value of engineer time saved: 9 hours/month × $85/hour (blended rate for our team) = $765/month saved.

Dollar value of incidents prevented: Based on the four incidents above, the noindex incident alone could've cost $3,500. If we prevent one incident like that per quarter, that's $1,166/month in avoided client impact.

Net ROI: $1,920/month in value for $9.29/month in API costs. That's a 206x return. Even if you 10x the API costs for a larger team, it's still favorable.

Gaps and what we'd change

This system isn't perfect. Here's what's still rough:

No visual regression testing. Subagents can run Lighthouse and pa11y but can't look at screenshots and say "the hero section is broken." We're watching Claude's vision capabilities for this.

Baseline drift. The SEO baseline updates on successful deploy, but if you ship a regression that the system doesn't catch, it becomes the new baseline. We manually review baselines monthly.

External link flakiness. Twitter/X, LinkedIn, and some government sites block crawlers or rate-limit aggressively. We maintain an exclusion list, but it needs manual updates.

Cold start time. The first run after cloning a repo takes longer because npx needs to fetch packages. We're considering pre-installing the CLI tools in a Docker layer.

Anthropic rate limits. Spawning 5 subagents simultaneously can occasionally hit rate limits on the Claude API during peak hours. We added a 2-second stagger between spawns, which works but is inelegant.

Our longer agent definitions (schema validation is 400 words) occasionally produce less structured output than the shorter ones. We're considering splitting the schema validation agent into per-type sub-subagents.

FAQ

Do Claude Code subagents work with any LLM, or only Claude?

Subagents are a Claude Code feature tied to Anthropic's API. You need a Claude API key with access to Claude Code. The agent definition format is specific to Claude Code's .claude/ directory convention, not a general standard.

How much does running five subagents per deploy cost in API fees?

At our scale, roughly $0.12 per full deploy gate run using Claude Sonnet. That's about $9-10/month for 18 deploys per week. Opus would cost approximately 5x more but we haven't found it necessary for these tasks.

Can subagents run in CI/CD pipelines like GitHub Actions?

Yes. You can invoke Claude Code headlessly in a CI environment. We trigger ours on Vercel preview deploy completion via a webhook that calls claude-code run .claude/agents/deploy-gate.md with the preview URL as an environment variable.

What's the difference between a Claude Code skill and a subagent?

A skill is a markdown instruction file that teaches Claude how to do something -- like a recipe. A subagent is an isolated Claude instance that can be spawned with its own context and tools. Subagents use skills. Think of skills as documentation and agents as workers.

Do you need Anthropic's Routines feature or are raw subagents enough?

For our deploy gate workflow, raw subagents plus hooks in settings.json are sufficient. Routines add a higher-level orchestration layer that's useful for more complex multi-step workflows. We may adopt Routines if our deploy checks grow beyond six agents.

How do you handle subagent failures or timeouts?

Each subagent has a 120-second timeout. If a subagent fails or times out, the deploy gate orchestrator treats it as a WARN, not a BLOCK. We'd rather ship with an incomplete check than block deploys because Lighthouse hung. The summary notes which checks didn't complete.

Can this approach replace dedicated tools like Lighthouse CI or pa11y?

No -- it wraps them. The subagents call these tools via bash and then reason about the output. You still need the underlying tools installed. The value is in the orchestration, correlation, and natural-language reporting layer, not in replacing the scanners themselves.


Wij implementeren headless sites voor klanten die alles meten -- Core Web Vitals, schema markup, accessibility scores, organic traffic deltas. Eén slechte deploy kan rankings die maanden kostten om op te bouwen verwoesten. Dus toen Anthropic subagents, hooks en skills in Claude Code lanceerde, hebben we onze volledige pre-deploy pipeline eromheen herbouwd.

Dit artikel beschrijft onze exacte setup: de .claude/ directory structuur, elke subagent definitie, de hook configs en de skill files die alles samen binden. We delen vier echte incidenten die het systeem heeft voorkomen, de ROI math op onze schaal, en waar deze aanpak nog gaten heeft.

We hadden een CLS spike van 40% die toch naar productie ging

Drie weken geleden. Client's blog redesign. Alles zag er fijn uit in dev. We mergen.

Twee dagen later stuurt de CTO van de client een screenshot van Search Console. CLS springt van 0,08 naar 0,14 op mobile. Pages die nummer 3 stonden voor "enterprise billing software" zakken naar nummer 8. Omzetimpact? Ze schatten $40k/maand.

Het probleem? Een hero image die async laadde maar geen size attributes had. Klassiek. Onze CI ving niets omdat we layout shift niet controleerden op de daadwerkelijke preview build.

Toen begonnen we naar subagents te kijken.

Subagents zijn scoped Claude Code instances die binnen een parent session draaien met hun eigen system prompt, tool access en task boundary. Hooks triggeren subagents op specifieke punten -- voordat een command draait, na file changes, of op commit. Skills zijn herbruikbare instruction files (markdown) die Claude leren hoe een specifieke taak uit te voeren.

Anthropic lanceerde de redesign op 14 april 2025, met Routines naast deze primitives. Voor ons use case gaven raw subagents plus hooks ons fijnere controle over exact wanneer elke check draait en welke context het ontvangt.

Het sleutel verschil met traditionele CI checks: subagents kunnen redeneren over resultaten, fouten correleren tussen checks, en mensvriendelijke summaries schrijven. Een CI job geeft exit code 0 of 1. Een subagent geeft "The structured data on /blog/[slug] is missing the dateModified field, which was present in the previous build. This will likely cause a rich snippet regression in Google Search Console within 3-5 days."

Dat is het hele punt.

Na drie maanden brak ons GitHub Actions setup eindelijk

Onze vorige pipeline was een wirwar van GitHub Actions die Lighthouse CI, pa11y, linkinator en custom Node scripts aanriepen. Het werkte.

Soort van.

Maar het had drie problemen.

Geen redenering tussen checks. Als Lighthouse een CLS issue markeerde en de accessibility scan een ontbrekend alt tag op dezelfde image markeerde, kregen we twee aparte alerts zonder verbinding. Engineers moesten handmatig door CI logs grepenen, timestamps correleren, uitzoeken dat het dezelfde component was.

Verspilling van tijd.

Broze config. Elk tool had zijn eigen config file, threshold format en output schema. Thresholds updaten betekende 4-6 files aanraken. YAML hier. JSON daar. Environment variables op een derde plek. Eén typo en de hele pipeline exit 0 wanneer het zou moeten falen.

Geen contextuele uitleg. Engineers kregen pass/fail. Junior devs besteedden 20-40 minuten aan begrijpen waarom iets faalde en wat eraan te doen. "Accessibility score: 87" zegt je niet welk ARIA attribute ontbreekt of waarom het belangrijk is voor screenreaders.

We zouden 3 uur per week doorbrengen met debuggen van false positives of uitleg geven over fouten op Slack.

De laatste druppel? Augustus 2025. We pushen een Northwind Traders redesign om 16:00 op vrijdag (ik weet het). Lighthouse passed. Accessibility passed. Links passed. We shippem.

Maandagochtend mailt hun VP of Marketing ons. "Waarom ontbreken onze product pages in Google?" Het bleek dat we per ongeluk robots meta op noindex hadden gezet op elke page onder /products/. Onze CI controleerde robots tags niet. Het duurde zes dagen om opnieuw geïndexeerd te worden. Ze verloren ongeveer $12k omzet.

We hadden niet nog een CI tool nodig -- we hadden een orchestration layer nodig die kon redeneren over de outputs van tools die we al vertrouwden. Goed geschreven skill files zijn het verschil tussen een subagent die accessibility rules hallucineert en een die pa11y met de juiste flags draait en de JSON output correct interpreteert.

Onze .claude/ directory structuur

Hier is de daadwerkelijke boom:

.claude/
├── settings.json
├── agents/
│   ├── seo-regression.md
│   ├── cwv-smoke.md
│   ├── accessibility.md
│   ├── broken-links.md
│   ├── schema-validation.md
│   └── deploy-gate.md
├── skills/
│   ├── run-lighthouse.md
│   ├── run-pa11y.md
│   ├── run-linkinator.md
│   ├── parse-schema-org.md
│   ├── compare-seo-snapshot.md
│   └── format-deploy-report.md
└── snapshots/
    └── seo-baseline.json

De snapshots/ directory houdt baseline data voor comparison checks. Simpel. We versionen het in Git zodat we kunnen zien wat veranderde toen een client vraagt "waarom zakten rankings vorige dinsdag?"

Niets ingewikkelds. Alleen markdown files en JSON.

Een client belde om 23:00 omdat Google al hun rich snippets verwijderde

September 2025. We bouwen een e-commerce site voor een middelgrote retailer (laten we zeggen Acme Home Goods). Ze hadden zes maanden besteed aan rich snippets krijgen -- product sterren, prijzen, beschikbaarheid -- die in zoekresultaten verschijnen.

We pushen een Shopify theme update. Ziet er fijn uit. Shipped vrijdagavond.

Zaterdag om 23:14 uur krijg ik een berichtje. "Onze product pages zien er broken uit in Google. Sterren weg. Prijzen weg. Wat gebeurde er?"

Ik open Search Console. Elke product page geeft structured data errors. Het offers veld mist priceCurrency. Zonder het laat Google de rich snippet niet zien. Rankings daalden niet, maar click-through rate ging van 4,2% naar 1,8% 's nachts.

Kosten? Ongeveer $8k/week in lost traffic tot we het fixten en Google alles opnieuw crawlde.

Het schema was er. We veranderden alleen de property naam van priceCurrency naar currency omdat de Shopify API die key gebruikt. Dachten er niet over na. Geen validatie ving het.

Toen bouwden we de schema-validation subagent.

Je creëert een markdown file in .claude/agents/ met een system prompt, een lijst met allowed tools en task instructions. De parent session (of een hook) spawnt het met dispatch_agent() of via de hook config in settings.json.

Minimale structuur:

# Agent: [Name]

## Role
[One-line description]

## Allowed Tools
- Bash (restricted to specific commands)
- Read file
- Write file

## Instructions
[Step-by-step task description, referencing skill files]

## Output Format
[Exact format the parent expects]

Wees extreem specifiek over output format. Als de deploy-gate orchestrator JSON verwacht met een passed boolean en een summary string, zeg dat dan. Subagents die free-form text returnen breken orchestration. We leerden dit de hard way toen een subagent markdown tables returende en de deploy gate het niet kon parsen. Het duurde twee uur om 's nachts om 2:00 te debuggen omdat de parent stilletjes faalde. Geen error. Gewoon triggerde het deploy block niet.

Maak mijn fout niet. Lock down het format.

Subagent 1: SEO regression check

Dit vergelijkt SEO-kritieke elementen van de huidige build tegen een baseline snapshot.

# Agent: SEO Regression Check

## Role
Detect SEO regressions between the current build and the stored baseline.

## Allowed Tools
- Bash (node scripts only)
- Read file

## Instructions
1. Read the skill file at .claude/skills/compare-seo-snapshot.md
2. Run: node scripts/extract-seo-meta.js --url=$PREVIEW_URL --output=/tmp/seo-current.json
3. Read .claude/snapshots/seo-baseline.json
4. Compare the two snapshots field by field:
   - title tags (exact match)
   - meta descriptions (similarity > 0.85)
   - canonical URLs (exact match)
   - h1 count (must equal 1 per page)
   - robots meta (must not have changed to noindex)
   - Open Graph tags (og:title, og:description, og:image present)
5. Flag any page where robots changed to noindex as CRITICAL.
6. Flag missing or duplicate title tags as HIGH.
7. Flag meta description changes > 15% different as MEDIUM.

## Output Format
{"passed": boolean, "critical": [], "high": [], "medium": [], "summary": string}

Het extract-seo-meta.js script is 120 regels Puppeteer die elke page in de sitemap hit en title, meta, canonicals, h1s en OG tags naar JSON dumpt. Niets slims. Alleen extraction.

De waarde van de subagent zit in de vergelijking en redenering, niet de extraction. Het weet welke veranderingen belangrijk zijn. Welke cosmetisch zijn. Welke de client $15k in organic traffic volgende kwartaal kosten.

Voorbeeld: als je een meta description verandert van "Best CRM software for small businesses in 2025" naar "Best CRM software for small business", is de similarity score 0,91. Dat is fijn. Maar als het verandert naar "CRM software", daalt similarity naar 0,65. De subagent markeert het als MEDIUM omdat dat een 40% reduction in keyword density is en waarschijnlijk CTR raakt.

Het is niet alleen diff. Het is redeneren over wat de diff betekent.

We hebben vier issues met dit systeem gevangen tot nu toe. Het robots noindex ding. Een geval waar iemand alle OG images verwijderde (zou social shares hebben verwoest). Een geval waar title tags werd afgekapt naar 40 chars in plaats van 60 (zag er alleen slecht uit, hurt SEO niet, maar client zou het opgemerkt hebben). En een waar canonical URLs veranderde van https:// naar http:// (zou duplicate content penalties hebben veroorzaakt).

Elk zou ons minstens een paar uur cleanup en client vertrouwen gekost hebben. Waarschijnlijk meer.

We slaan seo-baseline.json in de repo op en updaten het als onderdeel van de deploy-success hook.

Subagent 2: Core Web Vitals smoke test

# Agent: CWV Smoke Test

## Role
Run Lighthouse on key pages and flag CWV regressions.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-lighthouse.md
2. Run Lighthouse CI against $PREVIEW_URL for these pages:
   - / (homepage)
   - /blog/ (listing)
   - /blog/[most-recent-post] (detail)
   - /services/ (if exists)
3. Thresholds (fail if any below):
   - LCP: 2500ms
   - FID/INP: 200ms
   - CLS: 0.1
   - Performance score: 85
   - Accessibility score: 90
4. If a metric regressed by more than 10% from the previous run,
   flag as WARNING even if still above threshold.
5. Include the specific element causing LCP or CLS where Lighthouse reports it.

## Output Format
{"passed": boolean, "pages": [{"url": string, "scores": {}, "flags": []}], "summary": string}

De geassocieerde skill file (run-lighthouse.md) bevat de exacte lhci CLI invocation:

# Skill: Run Lighthouse

## Command
```bash
npx @lhci/cli@0.14.0 collect \
  --url="$1" \
  --numberOfRuns=3 \
  --settings.preset=desktop \
  --settings.output=json \
  --settings.outputPath=/tmp/lhci-results/

Parsing

Read the median run from /tmp/lhci-results/. Extract:

  • categories.performance.score * 100
  • audits['largest-contentful-paint'].numericValue
  • audits['cumulative-layout-shift'].numericValue
  • audits['interaction-to-next-paint'].numericValue (if present)

## Subagent 3: Accessibility scan

```markdown
# Agent: Accessibility Scan

## Role
Run pa11y against preview URLs and report WCAG 2.1 AA violations.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-pa11y.md
2. Run pa11y against the same page set as the CWV agent.
3. Group results by severity: error, warning, notice.
4. For each error, include:
   - The WCAG criterion violated (e.g., 1.1.1 Non-text Content)
   - The HTML element (selector)
   - A one-sentence fix suggestion
5. Fail if any errors exist. Warn if warnings > 10.

## Output Format
{"passed": boolean, "error_count": number, "warning_count": number, "errors": [{"criterion": string, "selector": string, "fix": string}], "summary": string}

We gebruiken pa11y@8.0.0 met de --runner=axe flag. De default htmlcs runner mist enkele color contrast issues die axe vangt.

# Agent: Broken Link Scan

## Role
Crawl the preview site and report broken internal and external links.

## Allowed Tools
- Bash

## Instructions
1. Read .claude/skills/run-linkinator.md
2. Run: npx linkinator@6.1.2 $PREVIEW_URL --recurse --timeout 15000 --format json > /tmp/link-results.json
3. Filter results to status >= 400 or status === 0 (timeout).
4. Separate internal (same domain) from external broken links.
5. Internal broken links are CRITICAL. External broken links are WARNING.
6. Exclude known-flaky external domains: twitter.com, linkedin.com (they block crawlers).

## Output Format
{"passed": boolean, "internal_broken": [{"source": string, "target": string, "status": number}], "external_broken": [...], "summary": string}

Subagent 5: Schema validation

# Agent: Schema Validation

## Role
Validate JSON-LD structured data on all pages.

## Allowed Tools
- Bash
- Read file

## Instructions
1. Read .claude/skills/parse-schema-org.md
2. For each page in the sitemap:
   a. Extract all <script type="application/ld+json"> blocks
   b. Parse as JSON (fail if malformed)
   c. Validate required fields per @type:
      - Article: headline, datePublished, dateModified, author, image
      - LocalBusiness: name, address, telephone
      - WebPage: name, description
      - BreadcrumbList: itemListElement with position, name, item
   d. Check that all @id references resolve within the page's graph
   e. Validate URLs in schema are absolute, not relative
3. Flag missing required fields as HIGH.
4. Flag malformed JSON as CRITICAL.

## Output Format
{"passed": boolean, "pages": [{"url": string, "schemas": [{"type": string, "valid": boolean, "issues": []}]}], "summary": string}

Subagent 6: Deploy gate orchestrator

Deze parent agent spawnt de andere vijf en maakt de go/no-go beslissing.

# Agent: Deploy Gate

## Role
Orchestrate all pre-deploy checks and produce a final deploy decision.

## Allowed Tools
- Bash
- Read file
- Write file
- dispatch_agent

## Instructions
1. Spawn these agents in parallel:
   - .claude/agents/seo-regression.md
   - .claude/agents/cwv-smoke.md
   - .claude/agents/accessibility.md
   - .claude/agents/broken-links.md
   - .claude/agents/schema-validation.md
2. Collect all outputs.
3. Read .claude/skills/format-deploy-report.md
4. Decision logic:
   - If ANY agent has a CRITICAL flag: BLOCK deploy.
   - If 2+ agents have HIGH flags: BLOCK deploy.
   - If 1 agent has HIGH flags: WARN, require manual override.
   - Otherwise: APPROVE.
5. Write the full report to /tmp/deploy-report.md
6. Output the decision.

## Output Format
{"decision": "APPROVE" | "WARN" | "BLOCK", "reports": {agent_name: agent_output}, "summary": string}

Hook configuration: settings.json

Hier is onze daadwerkelijke settings.json (met client-specifieke URLs geredacteerd):

{
  "hooks": {
    "pre-commit": [
      {
        "agent": ".claude/agents/schema-validation.md",
        "condition": "files_changed_match('**/*.json', '**/structured-data/**')",
        "env": {
          "PREVIEW_URL": "http://localhost:3000"
        }
      }
    ],
    "pre-push": [
      {
        "agent": ".claude/agents/deploy-gate.md",
        "env": {
          "PREVIEW_URL": "$VERCEL_PREVIEW_URL"
        },
        "timeout": 300,
        "on_failure": "block"
      }
    ],
    "post-deploy-success": [
      {
        "command": "node scripts/extract-seo-meta.js --url=$PRODUCTION_URL --output=.claude/snapshots/seo-baseline.json",
        "description": "Update SEO baseline after successful deploy"
      }
    ]
  },
  "agent_defaults": {
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 8192,
    "timeout": 120
  },
  "skills_directory": ".claude/skills/"
}

Opmerkingen over deze config:

  • We gebruiken claude-sonnet-4-20250514 voor subagents, niet Opus. De reasoning taken hier rechtvaardigen het kostenverschil niet. Sonnet handelt "vergelijk twee JSON objecten en flag verschillen" prima af.
  • De timeout: 300 op de deploy gate geeft alle vijf subagents tijd om te draaien. Individuele agents hebben 120s defaults. De orchestrator krijgt 5 minuten omdat het op allemaal wacht.
  • De condition op de pre-commit hook betekent dat schema validation alleen draait als je schema-gerelateerde files aanraakt. Geen zin om het op een CSS change te draaien.
  • post-deploy-success update de baseline. Zonder dit vergelijkt je SEO regression check tegen stale data.

Skill definitions die alles samenbinden

Het skill file dat het meeste werk doet is compare-seo-snapshot.md:

# Skill: Compare SEO Snapshots

## Purpose
Compare two SEO metadata snapshots and identify regressions.

## Input
- Current snapshot: /tmp/seo-current.json
- Baseline snapshot: .claude/snapshots/seo-baseline.json

## Comparison Rules

### Title Tags
- If a title changed AND the page's organic traffic (from baseline metadata) > 1000 sessions/month, flag as HIGH.
- If a title is now empty or matches another page's title, flag as CRITICAL.
- If a title changed on a low-traffic page, flag as MEDIUM.

### Canonical URLs
- Any change to canonical URL is HIGH.
- A canonical pointing to a different domain is CRITICAL.
- A missing canonical (was present, now gone) is HIGH.

### Robots Meta
- Any page that gained "noindex" is CRITICAL.
- Any page that gained "nofollow" on internal links is HIGH.

### New Pages
- Pages in current but not in baseline are INFO (expected for new content).
- But verify they have: title, meta description, canonical, at least one h1.

### Removed Pages
- Pages in baseline but not in current are HIGH.
- These might indicate accidental route removal.

Dit skill file codeert maanden van SEO incident response in een formaat dat Claude betrouwbaar kan volgen. Zonder het zou de subagent redelijke maar inconsistente oordelen vellen over wat een regression vormt.

Vier incidenten die het systeem ving

Incident 1: Accidental noindex op 47 blog posts

Client: B2B SaaS bedrijf, 200 pages, 60k organic sessions/month.

Een developer update de <Head> component in de blog template om een nieuwe meta tag toe te voegen. Ze kopieerden-plakten uit de staging config, die <meta name="robots" content="noindex, nofollow"> hardcoded had. De verandering passeerde code review omdat de reviewer zich op de nieuwe tag focuste, niet de bestaande.

De SEO regression subagent markeert 47 pages als CRITICAL -- robots meta veranderde naar noindex. De deploy werd geblokkeerd.

Time to detect: 2 minuten 14 seconden na push. Zonder het systeem zou het zijn gevangen toen Search Console een coverage drop toonde 3-7 dagen later.

Estimated impact avoided: Die 47 posts reden ongeveer $14.000/maand in pipeline. Ook maar één week deindex event kon $3.500+ kosten.

Incident 2: CLS regression uit een nieuwe hero image

Client: E-commerce merk, Next.js 14 storefront op Shopify Hydrogen.

Het design team ruilde de homepage hero voor een nieuw image met ander aspect ratio maar update de width/height attributes op de <Image> component niet. De image laadde fijn maar veroorzaakte een CLS van 0,34 -- ver boven de 0,1 threshold.

De CWV smoke test subagent rapporteert CLS regression op de homepage. De summary riep specifiek op: "CLS caused by element img.hero-banner shifting 0.34 cumulative. The image dimensions (1920x800) don't match the container aspect ratio (16:9 = 1920x1080). Add explicit width={1920} height={800} or update the container."

Time to detect: 1 minuut 47 seconden.

Client: Professional services firma, 80 pages.

We herstructureerden hun service pages van /services/[name] naar /[category]/[name]. Redirects waren in plaats, maar drie blog posts hadden hardcoded links naar de oude URLs en de CMS-driven navigation had een cached entry die naar een verwijderde page wees.

De broken link scan vond 4 internal 404s. De subagent's summary opmerkte dat 3 van de 4 in blog post body content zaten (niet navigation), wat betekende dat ze waren gemist door de redirect audit.

Time to detect: 3 minuten 8 seconden. De linkinator crawl is het traagste deel.

Incident 4: Ontbrekende dateModified in Article schema

Client: Media bedrijf, 2.000 artikelen.

Een CMS migratie van WordPress naar Sanity verloor de dateModified veld mapping. De schema generation code viel terug naar null voor dateModified, wat ongeldige JSON-LD produceerde.

De schema validation subagent markeert elke article page als HIGH -- ontbrekend vereist dateModified veld. De summary legde uit: "Google requires dateModified for Article structured data to be eligible for Top Stories and rich results. All 2,147 article pages are affected."

Time to detect: 4 minuten 22 seconden (grote sitemap).

ROI: minuten bespaard per ship en dollars per maand

Hier is onze math:

Metric Before (CI + manual) After (subagents) Delta
Checks per deploy 4 tools, manual review 5 agents, automated +1 check, -100% manual review
Time to run all checks 8-12 min (sequential CI) 3-5 min (parallel subagents) -60%
Time to understand failures 20-40 min per failure 1-2 min (contextual summary) -90%
Deploys per week (all clients) 18 18 Same
False positive rate ~15% (noisy Lighthouse) ~4% (reasoning filters noise) -73%

Minuten bespaard per ship: Gemiddeld 25 minuten wanneer een check faalt (30% van deploys). Dat is 25 × 5,4 failing deploys/week = 135 minuten/week = 9 uur/maand.

Kosten van het systeem:

  • Claude API kosten voor subagents: ~$0,12 per volledige deploy gate run (5 agents, Sonnet, 6.000 tokens gemiddeld per agent)
  • 18 deploys/week × 4,3 weken × $0,12 = $9,29/maand in API kosten
  • Puppeteer/Lighthouse infrastructuur: runt op bestaande Vercel build instances, geen extra kosten
  • Maintenance time: ~2 uur/maand skill files en thresholds updaten

Dollar waarde van bespaard engineer time: 9 uur/maand × $85/uur (blended rate voor ons team) = $765/maand bespaard.

Dollar waarde van voorkomen incidenten: Gebaseerd op de vier incidenten hierboven, het noindex incident kon alleen al $3.500 kosten. Als we één incident per kwartaal voorkomen, dat's $1.166/maand in vermeden client impact.

Net ROI: $1.920/maand in value voor $9,29/maand in API kosten. Dat's een 206x return. Ook als je 10x de API kosten voor een groter team doet, is het nog gunstig.

Gaten en wat we zouden veranderen

Dit systeem is niet perfect. Hier is wat nog ruw is:

Geen visual regression testing. Subagents kunnen Lighthouse en pa11y draaien maar kunnen niet naar screenshots kijken en zeggen "de hero section is broken." We volgen Claude's vision capabilities hiervoor.

Baseline drift. De SEO baseline update op succesvolle deploy, maar als je een regression shippet die het systeem niet vangt, wordt het de nieuwe baseline. We reviewen baselines handmatig maandelijks.

External link flakiness. Twitter/X, LinkedIn en enkele overheidsites blokkeren crawlers of rate-limit agressief. We onderhouden een exclusion list, maar het heeft handmatige updates nodig.

Cold start time. De eerste run na het clonen van een repo duurt langer omdat npx packages moet ophalen. We overwegen de CLI tools in een Docker layer voor-installeren.

Anthropic rate limits. Het spawnen van 5 subagents tegelijk kan af en toe rate limits van de Claude API raken tijdens piekuren. We voegden een 2-seconde stagger tussen spawns toe, wat werkt maar inelegant is.

Onze langere agent definities (schema validation is 400 woorden) produceren af en toe minder gestructureerde output dan de kortere. We overwegen de schema validation agent in per-type sub-subagents te splitsen.

FAQ

Werken Claude Code subagents met elke LLM, of alleen Claude?

Subagents zijn een Claude Code feature gebonden aan Anthropic's API. Je hebt een Claude API key nodig met access tot Claude Code. Het agent definition format is specifiek voor Claude Code's .claude/ directory convention, geen general standard.

Hoeveel kost het draaien van vijf subagents per deploy in API fees?

Op onze schaal ongeveer $0,12 per volledige deploy gate run met Claude Sonnet. Dat's ongeveer $9-10/maand voor 18 deploys per week. Opus zou ongeveer 5x meer kosten maar we hebben het nodig gevonden voor deze taken.

Kunnen subagents in CI/CD pipelines zoals GitHub Actions draaien?

Ja. Je kunt Claude Code headless in een CI environment aanroepen. We triggeren de onze op Vercel preview deploy completion via een webhook die claude-code run .claude/agents/deploy-gate.md aanroept met de preview URL als environment variable.

Wat is het verschil tussen een Claude Code skill en een subagent?

Een skill is een markdown instruction file die Claude leert hoe je iets doet -- als een recept. Een subagent is een geïsoleerde Claude instance die met zijn eigen context en tools kan worden gespawnt. Subagents gebruiken skills. Denk van skills als documentatie en agents als workers.

Heb je Anthropic's Routines feature nodig of zijn raw subagents genoeg?

Voor onze deploy gate workflow zijn raw subagents plus hooks in settings.json voldoende. Routines voegen een hoger-level orchestration layer toe die nuttig is voor complexere multi-stap workflows. We kunnen Routines adopteren als onze deploy checks groeien voorbij zes agents.

Hoe handle je subagent failures of timeouts?

Elke subagent heeft een 120-seconde timeout. Als een subagent faalt of timeout, behandelt de deploy gate orchestrator het als WARN, niet BLOCK. We zouden eerder shippen met een incomplete check dan deploys blokkeren omdat Lighthouse vastliep. De summary noteert welke checks niet compleet werden.

Kan deze aanpak dedicated tools zoals Lighthouse CI of pa11y vervangen?

Nee -- het wikkelt ze in. De subagents roepen deze tools aan via bash en redeneren dan over de output. Je hebt de onderliggende tools nog steeds nodig geïnstalleerd. De waarde zit in de orchestration, correlatie en natural-language reporting layer, niet in het vervangen van de scanners zelf.