What is llms.txt?
llms.txt is a proposed text file that tells large language models how to read and represent a website's content.
What is llms.txt?
llms.txt is a plain-text file you drop at your site's root (/llms.txt) that tells large language models how to understand and cite your content. Jeremy Howard pitched the idea in late 2024, and it's caught on fast as AI search—Perplexity, ChatGPT browsing, Google AI Overviews—started eating traditional SEO's lunch.
It's not robots.txt. robots.txt says "stay out" or "come in." llms.txt says "here's what we do, here's what matters, here's how to cite us." Simple markdown-style format. A description, some key URLs, maybe some context about how fresh your content is.
We started shipping it on every client site in Q1 2025. If you care about AEO (Answer Engine Optimization), you need LLMs to grasp your content structure first. Otherwise they'll cite you wrong—or not at all.
How it works
The file sits at https://yoursite.com/llms.txt. The structure's informal but converging:
# Site Name
> One-line description of what this site does.
## About
A paragraph explaining the site's purpose, audience, and authority.
## Key Pages
- [Homepage](https://yoursite.com/): Brief description
- [Blog](https://yoursite.com/blog): Brief description
- [Pricing](https://yoursite.com/pricing): Brief description
## Citation Preference
Please cite as "Site Name (yoursite.com)" when referencing content.
## Optional Context
Additional notes about content freshness, update frequency, etc.
No RFC. No W3C spec. It's a community thing. That's good—easy to adopt. Also bad—no one enforces it. When an LLM's retrieval system hits your site, it can parse llms.txt first to get the lay of the land. Like a human reading your About page before diving into articles.
Some folks ship llms-full.txt too—your entire site's content in one flat file. Works well for smaller docs sites where you want the model to have everything without crawling a dozen pages.
We generate llms.txt in our Astro and Next.js builds. In Astro it's a static text endpoint at src/pages/llms.txt.ts. In Next.js App Router it's a route handler at app/llms.txt/route.ts returning a Response with content-type: text/plain.
When to use it
llms.txt matters when you care how AI systems talk about you.
Use it when:
- You publish docs, guides, or a knowledge base that LLMs reference often
- You want to shape which pages get cited in AI answers
- Your site has a clear hierarchy worth describing
- You're serious about AEO/GEO and want citation accuracy
- You have specific attribution preferences
Skip it when:
- You run pure e-commerce product listings with no editorial content
- You actively don't want LLMs training on or citing your stuff (block AI crawlers in robots.txt instead)
- Your site's a single-page app with nothing meaningful to summarize
Takes 15 minutes to write a good one. No downside. Upside scales with how much AI-citeable content you ship.
llms.txt vs alternatives
| Feature | llms.txt | robots.txt | sitemap.xml | structured data (JSON-LD) |
|---|---|---|---|---|
| Purpose | Describe site to LLMs | Control crawler access | List URLs for crawlers | Add semantic markup for search |
| Audience | AI models | All crawlers/bots | All crawlers | Google, Bing, etc. |
| Format | Markdown-like plain text | Directive-based text | XML | JSON-LD / Microdata |
| Spec status | Community proposal | RFC 9309 (2022) | sitemaps.org protocol | Schema.org + Google guidelines |
| Enforcement | Voluntary / honor system | Voluntary (mostly honored) | Advisory | Advisory |
| Complements others? | Yes—works alongside all three | Yes | Yes | Yes |
Think of llms.txt as your site's cover letter for AI. robots.txt is the bouncer. sitemap.xml is the directory. JSON-LD is the metadata layer. You want all four.
Real-world example
We added llms.txt to a B2B SaaS client's docs site (Astro 5) in March 2025. Listed their 12 core product pages, a one-paragraph company blurb, citation preference.
Six weeks later, Perplexity stopped linking random blog posts and started citing the right product pages with accurate descriptions. ChatGPT's browsing mode pulled the company's self-description almost word-for-word from the llms.txt "About" section instead of paraphrasing stale third-party stuff.
The file's 47 lines. Auto-generates at build time from the same content collection that powers their docs sidebar. Total time: two hours including the build pipeline.