What is Content Modeling?

Content modeling is the practice of designing the structure, types, fields, and relationships of content within a content management system before any code or editorial work begins. Think of it as the schema layer between your editorial team's mental model and your database. A content model defines discrete content types (e.g., blogPost, author, category), each with typed fields (string, rich text, image, reference) and validation rules. The concept predates headless CMS adoption but became critical around 2018–2020 as teams decoupled content from presentation. A well-designed content model means your frontend can query exactly the data it needs — no over-fetching, no awkward field-name hacks. We've shipped content models on 50+ projects, primarily in Sanity and Contentful, and the investment in modeling upfront consistently saves 20–40% of downstream frontend refactoring time.

How it works

Content modeling happens in three phases: discovery, schema definition, and validation.

1. Discovery

You audit every content surface — pages, cards, modals, emails — and extract the atomic content units. A product page might break into product, variant, priceTier, and review. The goal is to find the smallest reusable unit that makes editorial sense.

2. Schema definition

In a headless CMS like Sanity, you define schemas in code:

// sanity/schemas/blogPost.ts
export default {
  name: 'blogPost',
  title: 'Blog Post',
  type: 'document',
  fields: [
    { name: 'title', type: 'string', validation: (Rule) => Rule.required().max(120) },
    { name: 'slug', type: 'slug', options: { source: 'title' } },
    { name: 'author', type: 'reference', to: [{ type: 'author' }] },
    { name: 'body', type: 'blockContent' },
    { name: 'publishedAt', type: 'datetime' },
    { name: 'categories', type: 'array', of: [{ type: 'reference', to: [{ type: 'category' }] }] },
  ],
}

References create relationships between documents. This is where content modeling differs from page building — you're not nesting content inside a page layout, you're creating a relational graph that any frontend can consume.

3. Validation and iteration

You wire the schema into the CMS, have editors create sample content, and watch where they hit friction. Field names that confuse editors, missing validation rules, or overly rigid structures all surface here. We typically run two rounds of editorial testing before locking a model.

When to use it

Content modeling is non-negotiable for any project with more than a couple pages of managed content. Specifically:

Do content modeling when:

You're adopting a headless CMS (Sanity, Contentful, Strapi, Payload)
Multiple frontends consume the same content (web + app + email)
You have more than one editor or content role
Content needs to be reused across pages (e.g., a testimonial appearing in 5 places)
You're migrating from WordPress or another monolithic CMS

Skip formal modeling when:

You're building a static marketing one-pager with hardcoded copy
The project is a prototype with a lifespan under 3 months
There's literally one content type with fewer than 5 fields

Our preferred stack is Sanity v3 with TypeScript schemas because the model lives in your repo — version-controlled, reviewable, deployable.

Content Modeling vs alternatives

Approach	Structure	Flexibility	Best for
Content modeling (typed schemas)	High — fields and types defined upfront	Medium — changes require schema updates	Multi-surface, team-edited content
Page builder (WYSIWYG blocks)	Low — layout-coupled content	High — editors drag and drop	Single-site, design-heavy landing pages
Flat files (MDX/Markdown)	Medium — frontmatter + body	Low — no relational queries	Developer blogs, docs sites
Database-first (raw SQL/Prisma)	High — full relational control	Low — no editorial UI out of the box	App data, not editorial content

Content modeling and page builders aren't mutually exclusive. In many of our projects we model structured content types and give editors a block-based page builder for layout — but the blocks reference modeled content rather than containing raw text.

Real-world example

On a recent SaaS marketing site built with Astro 5 and Sanity v3, we modeled 11 content types: page, blogPost, author, caseStudy, testimonial, feature, pricingTier, faqItem, changelogEntry, legalDoc, and globalSettings. The testimonial type was referenced by caseStudy, page (via a testimonial block), and pricingTier. Without the content model, that testimonial would've been copy-pasted into three places and drifted within a week. The model took about 6 hours to build and test. Over 8 months of active content updates, the editorial team made zero "where does this go?" support requests — the schema guided them.

Frequently asked questions about Content Modeling

Is content modeling the same as schema design?

They overlap heavily but aren't identical. Schema design is a broader term that applies to databases, APIs, and GraphQL types. Content modeling is schema design applied specifically to editorial content in a CMS context. It carries additional concerns like editor usability, preview workflows, and validation messaging that pure database schema design doesn't worry about. In practice, when someone says 'schema design' in a headless CMS project, they almost always mean content modeling.

When did content modeling become standard practice?

Content modeling as a discipline dates back to the structured content movement of the early 2010s, championed by people like Rachel Lovinger and Carrie Hane. It became a mainstream development concern around 2018–2020 when headless CMS platforms like Contentful (founded 2013, widespread adoption ~2017) and Sanity (launched 2017, Sanity v3 released December 2022) made schema-first workflows the default. By 2023, most agencies and product teams building on headless CMS treated content modeling as a required project phase.

What's the alternative to content modeling?

The main alternative is unstructured or page-coupled content — think WordPress classic editor where a page is just a title and a big rich text blob, or a page builder where content is embedded directly in layout blocks. This works fine for small sites where one person manages everything. The trade-off is that you can't easily reuse content across surfaces, enforce consistency, or query content programmatically. Flat-file approaches like MDX with frontmatter offer a middle ground — some structure via frontmatter fields, but no relational references or validation without custom tooling.

How many content types should a typical content model have?

For a marketing site, 8–15 content types is a healthy range. Under 5 usually means you've stuffed too much into generic types. Over 20 often signals over-engineering — you've modeled variations that could be handled with a single type and a category field. We've seen enterprise projects with 40+ types, but most of those benefit from refactoring. Start with the minimum viable model, ship it, and add types when real editorial needs demand them. The cost of adding a type later is low in code-based CMS schemas like Sanity; the cost of removing one after editors have created 200 documents is high.