Every agency says they hate cold email. We did too — until we realized the problem wasn't cold email itself, it was every tool we tried to use for it. The generic templates. The "Hi {firstName}" energy. The $300/month platforms that still required hours of manual work. So we did what developers do: we built our own.

This isn't a theoretical architecture post. We've been running this system in production for months, sending thousands of personalized emails that actually get replies. I'm going to walk you through exactly why we built it, how the pieces fit together, and what we learned the hard way.

Table of Contents

Why We Built Our Own Cold Email System with Claude, Instantly & Supabase

The Problem With Off-the-Shelf Outreach

We tried the usual suspects. Lemlist. Apollo. Woodpecker. They're fine tools for a lot of use cases. But as a headless web development agency, our outreach needs were specific in ways these platforms couldn't handle.

Here's what kept breaking down:

Generic personalization fields aren't personalization. Inserting someone's company name and job title into a template doesn't fool anyone in 2025. We needed emails that referenced a prospect's actual tech stack, their site performance issues, or specific architectural decisions visible on their public website.

The research step was the bottleneck. Our best-performing outreach always involved someone on the team actually looking at a prospect's site, running it through PageSpeed Insights, checking their framework, and writing something specific. That took 10-15 minutes per lead. At scale, that's a full-time job.

Data lived in too many places. Leads in one spreadsheet, email sequences in another platform, results in a third dashboard. We couldn't build feedback loops because nothing talked to anything else.

The AI integrations were surface-level. Some platforms added "AI writing" features, but they were basically GPT wrappers that generated the same bland copy everyone else was sending. No ability to feed in custom context, no control over prompts, no way to build multi-step reasoning chains.

We needed a system where AI did the research, not just the writing.

Our Tech Stack and Why We Chose It

Here's what we landed on after a few iterations:

Component Tool Role Monthly Cost
Lead finding & email verification Hunter.io Find and verify email addresses $49 (Starter)
AI research & copywriting Claude (Anthropic API) Analyze prospects, generate personalized emails ~$30-60
Database & orchestration Supabase Store leads, manage state, trigger workflows $25 (Pro)
Email sending & warmup Instantly.ai Deliverability, sending infrastructure, warmup $30 (Growth)
Automation glue Custom Edge Functions + Cron Connect everything together $0 (included in Supabase)

We evaluated a bunch of alternatives. Here's the short version of why we picked what we picked:

Claude over GPT-4: We tested both extensively. Claude 3.5 Sonnet (and now Claude 4 Sonnet in 2025) consistently produced emails that sounded more natural and less "AI-ish." It was also better at following complex system prompts without drifting. The pricing was comparable, but Claude's longer context window meant we could feed in more research data per prospect.

Supabase over Airtable or a custom Postgres setup: We needed a real database with row-level security, but we didn't want to manage infrastructure. Supabase gave us Postgres, Edge Functions, Cron jobs, and a decent dashboard — all in one place. We use Supabase heavily for client projects too, so the team already knew it well.

Instantly over Lemlist or Smartlead: Instantly's warmup network is genuinely good, their API is clean, and the pricing made sense for our volume. We don't need Instantly's built-in sequence builder because we handle sequencing logic ourselves.

Hunter over Apollo or Snov.io: Hunter's email verification is consistently the most accurate we've tested. Their domain search API is fast and the data quality is high. Apollo has more data points, but we found their email accuracy lower, which kills deliverability.

Architecture Overview

The system works in five stages, each running independently:

[Lead Sources] → [Hunter Enrichment] → [Supabase DB] → [Claude Research + Copy] → [Instantly Sending]
     ↑                                       ↑                                           |
     |                                       |                                           |
     +----------- Feedback Loop -------------+-------------------------------------------+
  1. Ingest: We feed in prospect domains from various sources (manual lists, scrapers, referral data)
  2. Enrich: Hunter finds contacts and verifies emails
  3. Store: Everything lands in Supabase with status tracking
  4. Research + Write: Claude analyzes each prospect and generates personalized copy
  5. Send: Approved emails push to Instantly campaigns
  6. Learn: Reply data flows back to Supabase, informing future personalization

Each stage is decoupled. If Hunter's API goes down, the enrichment queue just backs up — it doesn't break sending. If we want to swap Claude for a different model, we change one function.

Why We Built Our Own Cold Email System with Claude, Instantly & Supabase - architecture

Finding and Enriching Leads with Hunter

Hunter.io handles two critical jobs: finding the right person at a company and verifying that their email actually works.

Here's a simplified version of our enrichment function:

import { createClient } from '@supabase/supabase-js';

const HUNTER_API_KEY = Deno.env.get('HUNTER_API_KEY');

async function enrichLead(domain: string) {
  // Domain search to find decision makers
  const searchRes = await fetch(
    `https://api.hunter.io/v2/domain-search?domain=${domain}&department=executive,it&api_key=${HUNTER_API_KEY}`
  );
  const searchData = await searchRes.json();
  
  const contacts = searchData.data.emails
    .filter((e: any) => e.confidence > 70)
    .slice(0, 3); // Top 3 contacts per domain
  
  // Verify each email
  for (const contact of contacts) {
    const verifyRes = await fetch(
      `https://api.hunter.io/v2/email-verifier?email=${contact.value}&api_key=${HUNTER_API_KEY}`
    );
    const verifyData = await verifyRes.json();
    
    if (verifyData.data.status === 'valid') {
      await supabase.from('leads').insert({
        domain,
        email: contact.value,
        first_name: contact.first_name,
        last_name: contact.last_name,
        position: contact.position,
        confidence: contact.confidence,
        status: 'enriched',
        enriched_at: new Date().toISOString()
      });
    }
  }
}

We filter for the executive and it departments because those are our buyers — CTOs, VPs of Engineering, technical founders. Hunter's department filtering isn't perfect, but it cuts out a lot of noise.

One thing we learned: never skip email verification. Even with Hunter's confidence scores, we still verify every single address. A bounce rate above 3% will tank your sending domain's reputation. We've seen domains go from 95% inbox placement to 40% spam folder from one bad batch.

We run about 500 credits of Hunter searches per week, which fits comfortably in their Starter plan.

AI Personalization with Claude

This is where things get interesting. The Claude integration isn't just "write me a cold email." It's a multi-step research and writing pipeline.

Step 1: Website Analysis

Before Claude writes anything, we feed it data about the prospect's website. We scrape basic information using a lightweight function:

async function analyzeProspectSite(domain: string) {
  // Fetch homepage and key pages
  const homepage = await fetch(`https://${domain}`);
  const html = await homepage.text();
  
  // Extract tech signals from HTML
  const signals = {
    hasNextJs: html.includes('__next') || html.includes('_next/static'),
    hasReact: html.includes('react') || html.includes('__REACT'),
    hasWordPress: html.includes('wp-content') || html.includes('wp-includes'),
    hasShopify: html.includes('shopify') || html.includes('cdn.shopify'),
    hasGatsby: html.includes('gatsby'),
    usesJQuery: html.includes('jquery'),
    metaGenerator: extractMeta(html, 'generator'),
    pageSize: html.length,
    // ... more signals
  };
  
  // Run PageSpeed check via API
  const psiData = await fetchPageSpeedInsights(domain);
  
  return {
    ...signals,
    performanceScore: psiData.lighthouseResult.categories.performance.score * 100,
    lcp: psiData.lighthouseResult.audits['largest-contentful-paint'].numericValue,
    cls: psiData.lighthouseResult.audits['cumulative-layout-shift'].numericValue,
    fid: psiData.lighthouseResult.audits['max-potential-fid'].numericValue
  };
}

This gives Claude real data to work with. Not "Hi, I noticed your company does X" — more like "Your homepage LCP is 4.2 seconds and you're still running jQuery alongside React, which is adding 90KB to your initial bundle."

Step 2: Claude Research Prompt

We use Claude's API with a carefully crafted system prompt. Here's a simplified version:

const researchPrompt = `You are a senior web developer analyzing a prospect's website for a headless development agency. Given the following technical data about their site, identify:

1. Their current tech stack (be specific)
2. 2-3 concrete performance or architecture issues
3. What a migration to a modern headless architecture could improve
4. A specific, non-obvious observation that shows genuine analysis

Do NOT be generic. If you can't find something specific, say so.
Do NOT mention "in today's digital landscape" or similar filler.
Be direct and technical.

Site data:
${JSON.stringify(siteAnalysis, null, 2)}

Prospect: ${lead.first_name} ${lead.last_name}, ${lead.position} at ${lead.domain}`;

const research = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1000,
  messages: [{ role: 'user', content: researchPrompt }]
});

Step 3: Email Generation

The research output feeds into a second Claude call that writes the actual email. Splitting research from writing was a key insight — when we tried to do both in one prompt, the emails were worse. Claude would skip the research to get to writing faster.

const emailPrompt = `Write a cold email from a senior developer at a headless web agency.

Research notes:
${research.content[0].text}

Rules:
- 4-6 sentences max. Every sentence must earn its place.
- Lead with the most specific technical observation.
- No flattery. No "I love what you're doing."
- One clear CTA: ask if they'd want to see a performance audit.
- Sound like a developer, not a salesperson.
- Use their first name. No last name in greeting.
- Subject line: short, specific to their tech issue, lowercase.`;

The result? Emails that open with things like "Your Shopify Plus store is server-rendering product pages that could be statically generated — that's adding 2+ seconds to every product view" instead of "I noticed your impressive company and wanted to reach out."

Supabase as the Orchestration Layer

Supabase is the brain of the operation. Here's our core schema:

create table leads (
  id uuid primary key default gen_random_uuid(),
  domain text not null,
  email text,
  first_name text,
  last_name text,
  position text,
  confidence int,
  status text default 'new', -- new, enriched, researched, drafted, approved, sent, replied, bounced
  site_analysis jsonb,
  research_notes text,
  email_subject text,
  email_body text,
  instantly_campaign_id text,
  sent_at timestamptz,
  opened_at timestamptz,
  replied_at timestamptz,
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

create index idx_leads_status on leads(status);
create index idx_leads_domain on leads(domain);

The status field drives everything. Supabase Cron jobs run every 15 minutes, picking up leads at each stage and pushing them to the next:

-- Cron: Process enriched leads through Claude research
select cron.schedule(
  'process-research',
  '*/15 * * * *',
  $$select net.http_post(
    'https://your-project.supabase.co/functions/v1/process-research',
    '{}',
    '{"Authorization": "Bearer your-service-key"}'::jsonb
  )$$
);

We batch process 20 leads per run to stay within Claude's rate limits and keep costs predictable.

The site_analysis JSONB column is incredibly useful. We can query across all our leads to find patterns — like "show me all leads running WordPress with a performance score below 50" — and build targeted campaigns from those segments.

Sending at Scale with Instantly

Instantly handles the actual email delivery. We push approved emails via their API:

async function pushToInstantly(lead: Lead) {
  const response = await fetch('https://api.instantly.ai/api/v1/lead/add', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      api_key: INSTANTLY_API_KEY,
      campaign_id: lead.instantly_campaign_id,
      skip_if_in_workspace: true,
      leads: [{
        email: lead.email,
        first_name: lead.first_name,
        last_name: lead.last_name,
        company_name: lead.domain,
        personalization_1: lead.email_subject,
        personalization_2: lead.email_body
      }]
    })
  });
  
  if (response.ok) {
    await supabase
      .from('leads')
      .update({ status: 'sent', sent_at: new Date().toISOString() })
      .eq('id', lead.id);
  }
}

Instantly's campaign templates use {{personalization_1}} and {{personalization_2}} variables, which map to our Claude-generated subject and body. The campaign itself is just a shell — all the intelligence lives in our system.

We run 3 sending accounts through Instantly's warmup for at least 2 weeks before sending any outreach. Domain warmup is not optional. We learned this the hard way with our first domain getting flagged within a week.

Deliverability Setup

Our sending infrastructure:

  • 3 domains (variations of our brand, not our main domain)
  • SPF, DKIM, and DMARC configured on all of them
  • Google Workspace accounts (not Outlook — Google handles cold email better in our testing)
  • Instantly warmup running continuously, even on active sending days
  • Max 35 emails per account per day
  • Random send intervals between 3-7 minutes

The Automation Glue

Supabase Edge Functions connect everything. Here's the flow in pseudocode:

Every 15 minutes:
  1. Pick up leads with status='new', run Hunter enrichment → status='enriched'
  2. Pick up leads with status='enriched', run site analysis → status='analyzed'
  3. Pick up leads with status='analyzed', run Claude research + email gen → status='drafted'
  4. (Human reviews drafted emails in Supabase dashboard)
  5. Pick up leads with status='approved', push to Instantly → status='sent'
  6. Pull engagement data from Instantly API → update opened_at, replied_at

Step 4 is important. We don't fully automate sending. Every email gets a human review before it goes out. This catches the occasional hallucination (Claude once claimed a site was built with Remix when it was clearly Next.js) and lets us add personal touches.

The review step takes about 2-3 seconds per email since Claude does 95% of the work correctly. We approve in batches using a simple Supabase dashboard view.

Results and What We Learned

We've been running this system since Q1 2025. Here are real numbers:

Metric Our System Industry Average (2025)
Open Rate 62% 24%
Reply Rate 8.4% 1-3%
Positive Reply Rate 4.1% 0.5-1%
Bounce Rate 0.8% 3-5%
Cost Per Lead Contacted $0.18 $0.50-2.00
Time Per Lead (human) ~5 seconds (review) 10-15 minutes

The open rate is high because the subject lines are specific. "your shopify lcp is 4.2s" gets opened. "Quick question" doesn't.

The reply rate is high because the emails demonstrate genuine technical knowledge. When a CTO reads an email that correctly identifies their tech stack and a real performance issue, they're more likely to engage — even if they know it's outreach.

What Didn't Work

Fully automated sending (no human review): We tried this for two weeks. Claude hallucinated tech stack details about 5% of the time. That's a low error rate for an LLM, but sending an email that says "your React app" to someone running Vue is worse than sending a generic email. The trust damage is real.

Long emails: Our first Claude prompts generated 8-10 sentence emails. Reply rates were half of what we see now with 4-6 sentences. Shorter is better. Always.

Sending more than 40 emails per day per account: Deliverability drops off a cliff. 30-35 is the sweet spot in 2025.

Using Claude for follow-ups based on opens: We tried generating follow-up emails triggered by opens. The follow-ups felt pushy and the conversion wasn't worth the cost. We now send one simple, non-AI follow-up three days later.

Cost Breakdown

Here's what this costs us monthly, processing roughly 2,000 leads:

Service Monthly Cost Notes
Hunter.io (Starter) $49 500 searches + verifications
Anthropic API (Claude) $45 ~2,000 research + email generations
Supabase (Pro) $25 Database, Edge Functions, Cron
Instantly (Growth) $30 Sending, warmup, analytics
Google Workspace (3 accounts) $21 Sending infrastructure
Domains (3) $10 Amortized annual cost
Total ~$180 $0.09 per lead processed

Compare that to Apollo's $79/month plan (limited enrichment, basic sequences) or Lemlist's $69/month per seat. We're spending less and getting dramatically better results because the personalization is real, not template-based.

For context, this system has directly generated leads that turned into Next.js development and Astro development projects worth 50-100x the monthly cost. The ROI is absurd.

FAQ

How long did it take to build this system? The first working version took about two weeks of part-time effort — maybe 40 hours total. We've iterated on it continuously since then, mostly tweaking Claude prompts and adding edge case handling. If you're comfortable with Supabase Edge Functions and REST APIs, you could get a basic version running in a weekend.

Isn't this just spam with extra steps? Fair question. The difference is that every email contains a genuine technical observation about the recipient's website. We're not blasting "let's hop on a call" to 10,000 people. We're sending specific, useful insights to a targeted list of people who actually have the problems we solve. Our unsubscribe rate is under 0.5%, which suggests recipients don't see it as spam either.

Why Claude instead of GPT-4 or Gemini? We tested all three. Claude followed our system prompts more reliably — especially the constraints like "don't be generic" and "don't use filler phrases." GPT-4 would drift toward salesy language even with explicit instructions not to. Gemini was fast but the output quality was inconsistent. This may change as models evolve, and our system is designed to swap models easily.

How do you handle GDPR and CAN-SPAM compliance? All our outreach targets business emails (not personal), includes our physical address, and has a clear opt-out in every email. For GDPR, we process data under legitimate interest for B2B outreach, maintain records of processing activities, and honor removal requests immediately via an automated webhook. We also purge leads older than 90 days from our database automatically. Talk to a lawyer for your specific situation — this isn't legal advice.

What happens when a lead replies? Replies flow back from Instantly's API into Supabase. We get a Slack notification for every reply, and a human takes over the conversation immediately. We never use AI for reply handling. Once someone engages, they deserve a real person. Interested prospects get pointed to our contact page or directly to a call booking link.

Can this approach work for non-technical services? The site analysis piece is specific to web development, but the architecture pattern — enrich leads, use AI to research and personalize, send through a dedicated tool — works for any B2B outreach. You'd just need different research inputs. A design agency might analyze visual design and UX patterns. A marketing agency might pull SEO metrics. The key is feeding Claude real data, not asking it to make things up.

What's the hardest part of maintaining this system? Prompt maintenance. As Claude models update, prompts that worked perfectly sometimes need adjustment. We also spend time monitoring email deliverability — checking Google Postmaster Tools, watching for spam rate spikes, rotating sending accounts. It's maybe 2-3 hours per week of maintenance total.

Would you sell this as a product? We've thought about it, but honestly the competitive advantage is too valuable. If every agency ran this exact system, the effectiveness would drop because recipients would start seeing AI-researched emails everywhere. For now, we're keeping it as an internal tool. If you want help building something similar for your business, get in touch — we've helped a few clients set up similar systems as part of our headless CMS development work.