What is llms.txt?

llms.txt is a plain-text file you drop at your site's root (/llms.txt) that tells large language models how to understand and cite your content. Jeremy Howard pitched the idea in late 2024, and it's caught on fast as AI search—Perplexity, ChatGPT browsing, Google AI Overviews—started eating traditional SEO's lunch.

It's not robots.txt. robots.txt says "stay out" or "come in." llms.txt says "here's what we do, here's what matters, here's how to cite us." Simple markdown-style format. A description, some key URLs, maybe some context about how fresh your content is.

We started shipping it on every client site in Q1 2025. If you care about AEO (Answer Engine Optimization), you need LLMs to grasp your content structure first. Otherwise they'll cite you wrong—or not at all.

How it works

The file sits at https://yoursite.com/llms.txt. The structure's informal but converging:

# Site Name

> One-line description of what this site does.

## About

A paragraph explaining the site's purpose, audience, and authority.

## Key Pages

- [Homepage](https://yoursite.com/): Brief description
- [Blog](https://yoursite.com/blog): Brief description
- [Pricing](https://yoursite.com/pricing): Brief description

## Citation Preference

Please cite as "Site Name (yoursite.com)" when referencing content.

## Optional Context

Additional notes about content freshness, update frequency, etc.

No RFC. No W3C spec. It's a community thing. That's good—easy to adopt. Also bad—no one enforces it. When an LLM's retrieval system hits your site, it can parse llms.txt first to get the lay of the land. Like a human reading your About page before diving into articles.

Some folks ship llms-full.txt too—your entire site's content in one flat file. Works well for smaller docs sites where you want the model to have everything without crawling a dozen pages.

We generate llms.txt in our Astro and Next.js builds. In Astro it's a static text endpoint at src/pages/llms.txt.ts. In Next.js App Router it's a route handler at app/llms.txt/route.ts returning a Response with content-type: text/plain.

When to use it

llms.txt matters when you care how AI systems talk about you.

Use it when:

You publish docs, guides, or a knowledge base that LLMs reference often
You want to shape which pages get cited in AI answers
Your site has a clear hierarchy worth describing
You're serious about AEO/GEO and want citation accuracy
You have specific attribution preferences

Skip it when:

You run pure e-commerce product listings with no editorial content
You actively don't want LLMs training on or citing your stuff (block AI crawlers in robots.txt instead)
Your site's a single-page app with nothing meaningful to summarize

Takes 15 minutes to write a good one. No downside. Upside scales with how much AI-citeable content you ship.

llms.txt vs alternatives

Feature	llms.txt	robots.txt	sitemap.xml	structured data (JSON-LD)
Purpose	Describe site to LLMs	Control crawler access	List URLs for crawlers	Add semantic markup for search
Audience	AI models	All crawlers/bots	All crawlers	Google, Bing, etc.
Format	Markdown-like plain text	Directive-based text	XML	JSON-LD / Microdata
Spec status	Community proposal	RFC 9309 (2022)	sitemaps.org protocol	Schema.org + Google guidelines
Enforcement	Voluntary / honor system	Voluntary (mostly honored)	Advisory	Advisory
Complements others?	Yes—works alongside all three	Yes	Yes	Yes

Think of llms.txt as your site's cover letter for AI. robots.txt is the bouncer. sitemap.xml is the directory. JSON-LD is the metadata layer. You want all four.

Real-world example

We added llms.txt to a B2B SaaS client's docs site (Astro 5) in March 2025. Listed their 12 core product pages, a one-paragraph company blurb, citation preference.

Six weeks later, Perplexity stopped linking random blog posts and started citing the right product pages with accurate descriptions. ChatGPT's browsing mode pulled the company's self-description almost word-for-word from the llms.txt "About" section instead of paraphrasing stale third-party stuff.

The file's 47 lines. Auto-generates at build time from the same content collection that powers their docs sidebar. Total time: two hours including the build pipeline.

Frequently asked questions about llms.txt

Is llms.txt the same as robots.txt?

No. robots.txt (governed by RFC 9309) controls which crawlers can access which parts of your site using allow/disallow rules. It's about access control. llms.txt is about content description—it tells LLMs what your site is, which pages matter, and how you'd like to be cited. They serve completely different functions and you should have both. robots.txt has been around since 1994; llms.txt emerged in late 2024. A common setup is using robots.txt to block AI training crawlers you don't want (like GPTBot or CCBot) while using llms.txt to guide the AI systems you do want to cite you accurately.

When did llms.txt become standard?

llms.txt was proposed by Jeremy Howard (founder of fast.ai and answer.ai) in September 2024. Adoption picked up through Q4 2024 and Q1 2025 as sites like Anthropic's docs, Cloudflare, and various developer documentation platforms published their own llms.txt files. It's not a formal standard—there's no W3C or IETF spec behind it. As of April 2026, it's a widely recognized community convention with growing adoption, particularly among developer tools, SaaS documentation sites, and content publishers who depend on AI citation traffic. The llmstxt.org community site tracks the evolving conventions.

What's the alternative to llms.txt?

If you don't want to create an llms.txt file, your main alternatives for influencing LLM behavior are well-structured JSON-LD (Schema.org markup), clear HTML semantics with proper heading hierarchy, and a thorough sitemap.xml. Some sites also create dedicated "AI-friendly" summary pages or use meta tags aimed at AI crawlers. But none of these directly address the specific problem llms.txt solves: giving an LLM a single, concise, human-written overview of your entire site. In practice, we recommend llms.txt alongside these other approaches rather than choosing one over another.

Do AI models actually read llms.txt files?

It depends on the model and its retrieval pipeline. Perplexity has confirmed it checks for llms.txt when crawling sites. ChatGPT's browsing mode and other retrieval-augmented generation (RAG) systems may fetch it during their crawl process. There's no guarantee every LLM reads it—just like there's no guarantee every crawler respects robots.txt. But the file is cheap to create and maintain, and the signals we've seen from citation patterns suggest it's being picked up by the major AI answer engines. The worst case is a wasted 47-line text file sitting on your server. The best case is accurate brand representation across every AI-generated answer.