Make Content AI-Ready Without Sanity -- Social Animal

There's a narrative floating around the CMS world right now that goes something like this: "If you want AI-ready content, you need Sanity's structured content approach." And look, Sanity's content lake and their GROQ-powered AI integrations are genuinely impressive. But here's the thing -- most teams can't just abandon their existing CMS. You've got years of content in WordPress. Your app's data layer lives in Supabase. You just finished migrating to Payload CMS six months ago. The idea of another migration makes your stomach turn.

Good news: you don't need to switch. You need to think differently about how your content is structured, stored, and exposed. I've spent the last year helping teams retrofit their existing stacks for AI consumption, and the patterns are surprisingly consistent regardless of which CMS or database you're running. Let me walk you through it.

What "AI-Ready Content" Actually Means
Why Sanity Gets All the Attention
Structuring Content for AI in WordPress
Payload CMS: You're Closer Than You Think
Supabase as an AI-Ready Content Layer
The Universal Principles of AI-Ready Content
Building an AI Abstraction Layer
Vector Embeddings Without a Full Migration
Real-World Architecture Patterns
FAQ

What "AI-Ready Content" Actually Means

Before we get tactical, let's clarify what we're actually talking about. "AI-ready content" isn't a marketing buzzword (well, it is, but there's substance underneath). It means your content meets three criteria:

Machine-parseable structure -- AI models can reliably extract meaning from your content without guessing at context
Rich metadata -- Every piece of content carries enough semantic information that an AI can understand relationships, intent, and context
API accessibility -- Content is available through programmatic interfaces that AI agents, RAG pipelines, and LLM tool-calling can consume

That's it. Notice what's not on the list: a specific vendor. These are architectural patterns, not product features.

The Content Intelligence Spectrum

Think of content AI-readiness on a spectrum:

Level	Description	Example
0	Blob of HTML	WordPress post with inline styles and mixed media
1	Separated concerns	Clean HTML with structured data markup
2	Field-level structure	Content broken into typed fields (title, summary, body, author)
3	Semantic relationships	Content with explicit references, taxonomies, and entity links
4	AI-native	Content with embeddings, semantic annotations, and machine-readable intent

Sanity's structured content model nudges you toward Level 3-4 by default. But every CMS can reach Level 3, and with some additional infrastructure, Level 4.

Why Sanity Gets All the Attention

Let's give credit where it's due. Sanity's approach to structured content is genuinely well-designed for AI use cases:

Portable Text stores rich text as a JSON AST rather than HTML, making it trivial to parse programmatically
GROQ queries return exactly the shape of data you need, which maps perfectly to LLM context windows
Content Lake treats content as a graph of typed documents with explicit references
Their AI SDK integrations in 2025 allow direct tool-calling from LLMs into content queries

But here's what the Sanity evangelists don't mention: these advantages are architectural patterns, not proprietary magic. You can implement every single one of these in your existing stack. It just takes intentional design.

The real question isn't "should I switch to Sanity?" It's "how do I apply structured content principles where I already am?"

Structuring Content for AI in WordPress

WordPress powers something like 43% of the web in 2025. If you're running WordPress, you're in good company, and you've got more options than you might think.

Step 1: Stop Using the Classic Editor for Everything

The Gutenberg block editor already stores content as structured blocks. Each block has a type, attributes, and content. This is closer to Sanity's Portable Text than most people realize.

{
  "blockName": "core/paragraph",
  "attrs": {},
  "innerBlocks": [],
  "innerHTML": "<p>This is structured content, not just HTML.</p>",
  "innerContent": ["<p>This is structured content, not just HTML.</p>"]
}

The block data is stored as serialized comments in post_content, but you can parse it programmatically:

$blocks = parse_blocks($post->post_content);
$structured = array_map(function($block) {
    return [
        'type' => $block['blockName'],
        'attributes' => $block['attrs'],
        'content' => strip_tags($block['innerHTML']),
    ];
}, array_filter($blocks, fn($b) => $b['blockName'] !== null));

Step 2: Invest in Custom Fields and Taxonomies

Advanced Custom Fields (ACF) or Meta Box give you Level 2-3 content structure. But you need to be intentional about it. Don't just add fields -- design a content model.

// Register a structured content type for AI consumption
register_post_type('knowledge_article', [
    'supports' => ['title', 'custom-fields'],
    'show_in_rest' => true, // Critical for API access
]);

// Define semantic fields
acf_add_local_field_group([
    'title' => 'AI-Ready Content Fields',
    'fields' => [
        ['key' => 'summary', 'label' => 'Summary', 'type' => 'textarea'],
        ['key' => 'key_concepts', 'label' => 'Key Concepts', 'type' => 'taxonomy', 'taxonomy' => 'concept'],
        ['key' => 'content_intent', 'label' => 'Content Intent', 'type' => 'select', 'choices' => [
            'informational' => 'Informational',
            'transactional' => 'Transactional',
            'navigational' => 'Navigational',
        ]],
        ['key' => 'related_entities', 'label' => 'Related Entities', 'type' => 'relationship'],
    ],
]);

Step 3: Expose Everything Through the REST API

WordPress REST API is your bridge to AI. Make sure custom fields are exposed:

add_action('rest_api_init', function() {
    register_rest_field('knowledge_article', 'ai_metadata', [
        'get_callback' => function($post) {
            return [
                'summary' => get_field('summary', $post['id']),
                'concepts' => wp_get_post_terms($post['id'], 'concept', ['fields' => 'names']),
                'intent' => get_field('content_intent', $post['id']),
                'related' => get_field('related_entities', $post['id']),
                'structured_blocks' => parse_blocks(get_post_field('post_content', $post['id'])),
            ];
        },
    ]);
});

If you're running WordPress as a headless CMS with a Next.js or Astro frontend (which is something we do a lot at Social Animal), this REST API becomes your AI's primary interface.

Step 4: Add JSON-LD Structured Data

This one's often overlooked for AI readiness, but it matters. Google's AI Overviews and other AI crawlers consume JSON-LD. Tools like Yoast SEO or RankMath generate basic schema, but for real AI readiness, you want to output detailed structured data:

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Make Your Content AI-Ready",
  "abstract": "How to structure existing CMS content for AI consumption",
  "about": [
    {"@type": "Thing", "name": "Content Management"},
    {"@type": "Thing", "name": "Artificial Intelligence"}
  ],
  "mentions": [
    {"@type": "SoftwareApplication", "name": "WordPress"},
    {"@type": "SoftwareApplication", "name": "Payload CMS"}
  ]
}

Payload CMS: You're Closer Than You Think

If you're already on Payload CMS, congratulations -- you're probably at Level 2-3 without much extra work. Payload's collection-based architecture with typed fields is inherently structured.

Why Payload Is Already AI-Friendly

Payload stores content as typed JSON documents in MongoDB or Postgres. Every field has a defined type. Relationships are explicit. This is exactly what AI needs.

// Payload collection that's already AI-ready
const Articles: CollectionConfig = {
  slug: 'articles',
  fields: [
    { name: 'title', type: 'text', required: true },
    { name: 'summary', type: 'textarea' },
    { name: 'body', type: 'richText' }, // Stored as Slate/Lexical JSON
    { name: 'topics', type: 'relationship', relationTo: 'topics', hasMany: true },
    { name: 'contentType', type: 'select', options: ['guide', 'tutorial', 'reference'] },
  ],
};

Payload's rich text editor (Lexical in v3.x) stores content as a JSON AST -- just like Sanity's Portable Text. You already have structured content.

Adding AI-Specific Fields to Payload

The gap between Payload and full AI-readiness is mostly about metadata. Add these fields to your collections:

const aiFields: Field[] = [
  {
    name: 'aiMetadata',
    type: 'group',
    fields: [
      { name: 'embedding', type: 'json', admin: { hidden: true } },
      { name: 'extractedEntities', type: 'json', admin: { readOnly: true } },
      { name: 'semanticSummary', type: 'textarea', admin: { readOnly: true } },
      { name: 'contentHash', type: 'text', admin: { hidden: true } },
    ],
  },
];

Then use Payload's hooks to auto-generate embeddings on save:

const generateEmbeddingHook: CollectionAfterChangeHook = async ({ doc, operation }) => {
  if (operation === 'create' || operation === 'update') {
    const textContent = extractTextFromLexical(doc.body);
    const embedding = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: `${doc.title}\n${doc.summary}\n${textContent}`,
    });
    
    await payload.update({
      collection: 'articles',
      id: doc.id,
      data: {
        aiMetadata: {
          ...doc.aiMetadata,
          embedding: embedding.data[0].embedding,
          contentHash: hashContent(textContent),
        },
      },
    });
  }
};

This is essentially what Sanity's AI features do under the hood. You're just doing it yourself. For teams building on Payload with Next.js, this pattern integrates naturally into your existing deployment pipeline.

Supabase as an AI-Ready Content Layer

Supabase is interesting because it's not a CMS -- it's a database platform. But increasingly, teams use it as their content backend, especially with Supabase's pgvector extension for embeddings.

The pgvector Advantage

Supabase has had pgvector support since 2023, and it's matured significantly. This means you can store content AND vector embeddings in the same database:

-- Enable the extension
create extension if not exists vector;

-- Create a content table with embedding support
create table content (
  id uuid default gen_random_uuid() primary key,
  title text not null,
  body text not null,
  metadata jsonb default '{}',
  content_type text not null,
  embedding vector(1536), -- OpenAI text-embedding-3-small dimension
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

-- Create an index for similarity search
create index on content using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);

Building a Content API for AI Agents

Supabase's auto-generated REST API plus Edge Functions give you everything you need:

// Supabase Edge Function for AI content retrieval
import { createClient } from '@supabase/supabase-js';

Deno.serve(async (req) => {
  const { query, limit = 5 } = await req.json();
  const supabase = createClient(Deno.env.get('SUPABASE_URL')!, Deno.env.get('SUPABASE_KEY')!);
  
  // Generate embedding for the query
  const embeddingResponse = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${Deno.env.get('OPENAI_API_KEY')}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'text-embedding-3-small',
      input: query,
    }),
  });
  
  const { data } = await embeddingResponse.json();
  const queryEmbedding = data[0].embedding;
  
  // Semantic search using pgvector
  const { data: results } = await supabase.rpc('match_content', {
    query_embedding: queryEmbedding,
    match_threshold: 0.7,
    match_count: limit,
  });
  
  return new Response(JSON.stringify(results), {
    headers: { 'Content-Type': 'application/json' },
  });
});

The Postgres function for similarity matching:

create or replace function match_content(
  query_embedding vector(1536),
  match_threshold float,
  match_count int
) returns table (
  id uuid,
  title text,
  body text,
  metadata jsonb,
  similarity float
) language sql stable as $$
  select
    content.id,
    content.title,
    content.body,
    content.metadata,
    1 - (content.embedding <=> query_embedding) as similarity
  from content
  where 1 - (content.embedding <=> query_embedding) > match_threshold
  order by content.embedding <=> query_embedding
  limit match_count;
$$;

This gives you a fully functional RAG (Retrieval-Augmented Generation) backend without any CMS migration. Your content lives in Supabase, your AI can query it semantically, and your Astro or Next.js frontend can consume it through the same API.

The Universal Principles of AI-Ready Content

Regardless of your CMS, these principles apply:

1. Separate Content from Presentation

This is the single biggest thing you can do. If your content is tangled with HTML, CSS classes, and layout concerns, AI can't reliably parse it. Store content as data, render it as HTML at the presentation layer.

2. Type Everything

Every field should have an explicit type. Don't use generic "text" fields for structured data. A date should be stored as a date. A reference should be a reference, not a slug string pasted into a text field.

3. Make Relationships Explicit

If Article A references Product B, that should be a typed relationship -- not a mention in the body text. AI tools need to traverse your content graph, and they can't do that with implied links.

4. Add Semantic Metadata

Go beyond basic SEO metadata. Include:

Content intent (informational, transactional, navigational)
Audience segment
Confidence/freshness indicators
Entity annotations
Topic classifications beyond basic categories

5. Version and Timestamp Everything

AI systems need to know how fresh content is. Include created_at, updated_at, and ideally a valid_until or review_date field. Stale content in a RAG pipeline leads to hallucinations.

Building an AI Abstraction Layer

Here's the pattern I keep coming back to: instead of migrating your CMS, add an AI abstraction layer on top of it.

[WordPress/Payload/Supabase] → [Content Sync] → [AI Layer (pgvector/Pinecone)] → [AI Consumers]

The AI layer:

Syncs content from your CMS via webhooks or polling
Normalizes it into a consistent structure regardless of source
Generates embeddings and stores them alongside the normalized content
Exposes an AI-optimized API for RAG, tool-calling, and semantic search

// Simplified content sync pipeline
interface NormalizedContent {
  id: string;
  source: 'wordpress' | 'payload' | 'supabase';
  sourceId: string;
  title: string;
  body: string; // Plain text, stripped of markup
  structuredBody: object; // JSON AST if available
  metadata: {
    type: string;
    intent: string;
    topics: string[];
    entities: string[];
    createdAt: string;
    updatedAt: string;
  };
  embedding?: number[];
}

async function syncContent(source: ContentSource): Promise<void> {
  const rawContent = await source.fetchAll();
  
  for (const item of rawContent) {
    const normalized = source.normalize(item);
    const embedding = await generateEmbedding(
      `${normalized.title}\n${normalized.body}`
    );
    
    await aiLayer.upsert({
      ...normalized,
      embedding,
    });
  }
}

This approach has a huge advantage: your editors keep using the CMS they know. No retraining, no migration, no downtime. The AI layer lives alongside your existing stack.

Vector Embeddings Without a Full Migration

Let's talk costs and tooling for 2025, because this matters for real-world decisions:

Embedding Provider	Model	Cost per 1M tokens	Dimensions	Notes
OpenAI	text-embedding-3-small	$0.02	1536	Best cost/quality ratio
OpenAI	text-embedding-3-large	$0.13	3072	Higher accuracy
Cohere	embed-v4	$0.10	1024	Good multilingual support
Voyage AI	voyage-3	$0.06	1024	Strong for code content
Local (Ollama)	nomic-embed-text	Free	768	Privacy-first option

For a typical content site with 5,000 articles averaging 1,500 words each, you're looking at roughly 7.5M tokens. With OpenAI's small model, that's $0.15 to embed your entire content library. Even re-embedding weekly is negligible.

Vector Storage Options

Solution	Free Tier	Pricing (2025)	Best For
Supabase pgvector	500MB database	$25/mo for 8GB	Teams already on Supabase
Pinecone	5M vectors	$70/mo starter	Production RAG at scale
Qdrant Cloud	1GB cluster	$25/mo	Advanced filtering needs
Weaviate Cloud	50k objects	$25/mo	Multi-modal content
Turbopuffer	1M vectors	Pay-per-query	Cost-sensitive projects

If you're already running Supabase, pgvector is the obvious choice. No additional service, no additional billing, no additional point of failure.

Real-World Architecture Patterns

Let me share two architectures I've actually built:

Pattern 1: WordPress + Supabase AI Layer

For a media company with 50k+ WordPress posts:

WordPress webhook fires on post save/update
A Supabase Edge Function receives the webhook
Content is fetched via WP REST API, normalized, and embedded
Stored in Supabase with pgvector
AI chatbot on the Next.js frontend queries Supabase for semantic search
Results are passed to GPT-4o as context for answer generation

Total additional infrastructure cost: ~$25/month for Supabase pro tier.

Pattern 2: Payload CMS with Built-in AI

For a SaaS documentation site on Payload v3:

Payload hooks generate embeddings on every document save
Embeddings stored in a vector column in the same Postgres database Payload uses
Custom Payload endpoint for semantic search
AI docs assistant powered by the same database
No external vector store needed

Total additional infrastructure cost: $0 beyond the OpenAI API calls (pennies per month).

Both patterns took about 2-3 weeks to implement, compared to the 3-6 months a full CMS migration would take. If you're considering this kind of architecture, we've got pricing tiers that cover exactly these types of projects.

FAQ

Do I really need to restructure my content for AI, or is it just hype?

It's not hype, but the urgency depends on your use case. If you're building AI features (chatbots, semantic search, personalization), structured content is essential. If you're optimizing for AI-driven search like Google's AI Overviews or ChatGPT's browsing, structured data and clean content hierarchies measurably improve your visibility. A 2025 study by Authoritas found that pages with schema markup were 40% more likely to appear in AI-generated answers.

What's the minimum I should do to make WordPress content AI-ready?

Three things: (1) Use Gutenberg blocks consistently instead of pasting HTML, (2) add JSON-LD structured data to every page, and (3) expose custom fields through the REST API. This gets you from Level 0-1 to Level 2-3 in a few weeks of focused work. You don't need to restructure your entire site overnight.

Can Payload CMS replace Sanity for AI-powered content?

For most use cases, yes. Payload v3 with Lexical rich text stores content as structured JSON, has typed fields and relationships, and supports Postgres with pgvector. The main thing Sanity offers that Payload doesn't have natively is the managed Content Lake with built-in AI features. But if you're willing to wire up your own embedding pipeline (which takes maybe a day), Payload gives you equivalent capabilities.

How much does it cost to add vector embeddings to an existing CMS?

Surprisingly little. For a site with 10,000 articles, initial embedding generation with OpenAI's text-embedding-3-small costs about $0.30. Ongoing costs for re-embedding updated content are typically under $5/month. The vector storage is the bigger cost -- expect $0-70/month depending on your provider and scale. Supabase's free tier can handle many small-to-medium sites.

Should I use a separate vector database or store embeddings in my existing database?

If you're on Postgres (which Payload v3 and Supabase both use), store embeddings in the same database with pgvector. One less service to manage, one less sync to break. Dedicated vector databases like Pinecone make sense when you have millions of documents or need sub-millisecond query times. For most content sites, pgvector is more than fast enough -- typical query times are 5-20ms for collections under 1M vectors.

How do I keep AI embeddings in sync with content changes?

Webhooks are your friend. Every modern CMS supports them. When content is created or updated, fire a webhook that triggers re-embedding. Store a content hash alongside the embedding so you can skip unchanged content. For WordPress, use the save_post action. For Payload, use afterChange hooks. For Supabase, use database triggers or Realtime subscriptions.

What about content in multiple languages -- does this approach still work?

Yes, but choose your embedding model carefully. OpenAI's text-embedding-3 models handle multilingual content well. Cohere's embed-v4 is specifically optimized for cross-lingual retrieval. The normalization layer should store the language code as metadata so your AI consumers can filter appropriately. One important note: embed each language version separately rather than concatenating translations.

Is migrating to a headless CMS a prerequisite for AI-ready content?

Not a prerequisite, but it helps enormously. Headless CMS architecture naturally separates content from presentation, which is the foundation of AI readiness. If you're still running a monolithic WordPress theme with content baked into template files, going headless (WordPress as a backend with a Next.js or Astro frontend) simultaneously improves your AI readiness and your frontend performance. It's often worth the investment even before considering AI use cases. If you want to explore this, reach out to us -- it's literally what we do every day.

Make Your Content AI-Ready Without Migrating to Sanity

Table of Contents

What "AI-Ready Content" Actually Means

The Content Intelligence Spectrum

Why Sanity Gets All the Attention

Structuring Content for AI in WordPress

Step 1: Stop Using the Classic Editor for Everything

Step 2: Invest in Custom Fields and Taxonomies

Step 3: Expose Everything Through the REST API

Step 4: Add JSON-LD Structured Data

Payload CMS: You're Closer Than You Think

Why Payload Is Already AI-Friendly

Adding AI-Specific Fields to Payload

Supabase as an AI-Ready Content Layer

The pgvector Advantage

Building a Content API for AI Agents

The Universal Principles of AI-Ready Content

1. Separate Content from Presentation

2. Type Everything

3. Make Relationships Explicit

4. Add Semantic Metadata

5. Version and Timestamp Everything

Building an AI Abstraction Layer

Vector Embeddings Without a Full Migration

Vector Storage Options

Real-World Architecture Patterns

Pattern 1: WordPress + Supabase AI Layer

Pattern 2: Payload CMS with Built-in AI

FAQ

Do I really need to restructure my content for AI, or is it just hype?

What's the minimum I should do to make WordPress content AI-ready?

Can Payload CMS replace Sanity for AI-powered content?

How much does it cost to add vector embeddings to an existing CMS?

Should I use a separate vector database or store embeddings in my existing database?

How do I keep AI embeddings in sync with content changes?

What about content in multiple languages -- does this approach still work?

Is migrating to a headless CMS a prerequisite for AI-ready content?

Let's build
something together.

Table of Contents

What "AI-Ready Content" Actually Means

The Content Intelligence Spectrum

Why Sanity Gets All the Attention

Structuring Content for AI in WordPress

Step 1: Stop Using the Classic Editor for Everything

Step 2: Invest in Custom Fields and Taxonomies

Step 3: Expose Everything Through the REST API

Step 4: Add JSON-LD Structured Data

Payload CMS: You're Closer Than You Think

Why Payload Is Already AI-Friendly

Adding AI-Specific Fields to Payload

Supabase as an AI-Ready Content Layer

The pgvector Advantage

Building a Content API for AI Agents

The Universal Principles of AI-Ready Content

1. Separate Content from Presentation

2. Type Everything

3. Make Relationships Explicit

4. Add Semantic Metadata

5. Version and Timestamp Everything

Building an AI Abstraction Layer

Vector Embeddings Without a Full Migration

Vector Storage Options

Real-World Architecture Patterns

Pattern 1: WordPress + Supabase AI Layer

Pattern 2: Payload CMS with Built-in AI

FAQ

Do I really need to restructure my content for AI, or is it just hype?

What's the minimum I should do to make WordPress content AI-ready?

Can Payload CMS replace Sanity for AI-powered content?

How much does it cost to add vector embeddings to an existing CMS?

Should I use a separate vector database or store embeddings in my existing database?

How do I keep AI embeddings in sync with content changes?

What about content in multiple languages -- does this approach still work?

Is migrating to a headless CMS a prerequisite for AI-ready content?

Keep reading

Hire a Claude Code Developer in 2026: Rates, Red Flags & More

10 Best Claude Code Agencies in 2026 (Ranked by Work Shipped)

Claude Code vs Cursor for Agencies: A Workflow Architecture Guide

Let's build something together.

Let's build
something together.