Concepts

The AX standards landscape: llms.txt, A2A, MCP, RSL, Content Signals, Web Bot Auth, and how they relate. Rendered from the canonical source in the repository: docs/concepts.md

"AI Agent Experience" (AX) is the sum of the conventions a site uses to be discovered, read, governed, and transacted with by autonomous AI agents and crawlers — the way "web accessibility" is the sum of conventions for assistive technology. This page maps the standards ax-audit checks against, why each exists, and how they relate. It's the conceptual companion to the mechanical detail in checks.md.

Why AX is its own discipline

Agents are not browsers. Three differences drive every check:

They mostly don't run JavaScript. GPTBot, ClaudeBot, CCBot and most crawlers fetch raw HTML. A client-rendered SPA that returns an empty <div id="root"> is, to them, a blank page. (html-rendering, content-negotiation)
They look for declared structure, not visual layout. An agent would rather read a /llms.txt summary or a JSON-LD graph than infer meaning from your CSS grid. (llms-txt, structured-data, meta-tags, agent-json, mcp, openapi)
Their access is a policy and economic question, not just a technical one. Who may crawl, for what use, at what price, under what license — these now have machine-readable answers. (robots-txt, Content Signals, rsl, agent-access)

Bot traffic is projected to exceed human traffic by 2029. AX is the interface layer for that shift.

The four families of standards

1. Content discovery & readability

Standard	What it is	Check
llms.txt	A Markdown file at your root summarizing your site for LLMs, with curated links. The "sitemap for AI."	`llms-txt`
Server-side rendering	Delivering real content in the HTML response, not assembling it client-side.	`html-rendering`
Markdown for Agents	Content negotiation: serve clean Markdown when a client sends `Accept: text/markdown`. ~80% fewer tokens than HTML.	`content-negotiation`
schema.org / JSON-LD	Structured data describing entities (Person, Organization, Product) in a graph agents can parse.	`structured-data`
Sitemaps	The classic XML index, still how crawlers enumerate your URLs.	`sitemap`

These answer: can an agent find your content and actually read it?

2. Agent interaction surface

Standard	What it is	Check
A2A — Agent2Agent	An "Agent Card" at `/.well-known/agent.json` advertising your agent's identity and skills, so other agents can interoperate.	`agent-json`
MCP — Model Context Protocol	A manifest at `/.well-known/mcp.json` describing tools and resources an agent can call. The emerging standard for exposing capabilities to LLMs.	`mcp`
OpenAPI	The long-standing machine-readable API description; agents use it to call your endpoints.	`openapi`
Emerging discovery files	`ai.txt`, `genai.txt`, `ai-plugin.json`, `agents.json`, `nlweb.json` — competing/early conventions, scored as coverage bonus.	`well-known-ai`
AI meta tags & discovery links	`ai:*` meta tags and `rel="alternate"` links pointing agents to your llms.txt / agent.json.	`meta-tags`

These answer: once an agent arrives, can it understand what you offer and act on it?

3. Access governance & licensing

This is the newest and fastest-moving family — the response to "AI scraped my content and now competes with me."

Standard	What it is	Check
Robots Exclusion Protocol	The original robots.txt — who may crawl what. ax-audit grades coverage of 48 known AI crawlers.	`robots-txt`
Content Signals	A robots.txt extension (Cloudflare, CC0) declaring how content may be used after access: `search`, `ai-input`, `ai-train`. Served by default on 3.8M+ Cloudflare domains.	`robots-txt` (findings)
RSL — Really Simple Licensing	A full machine-readable licensing layer (license.xml): permits/prohibits vocabularies, payment models (free, attribution, pay-per-crawl, pay-per-inference). Endorsed by 1,500+ publishers.	`rsl`
Cloaking integrity	Not a standard but a failure mode: your stated policy (robots.txt allows GPTBot) contradicting enforcement (WAF returns 403).	`agent-access`

These answer: have you expressed your access and usage policy in a form agents can honor — and does your infrastructure actually match it?

The progression is one of increasing expressiveness: robots.txt says who/where, Content Signals adds how it may be used, RSL adds under what license and price.

4. Transport, efficiency & hygiene

Standard	What it is	Check
TLS / HSTS	HTTPS everywhere; many agents refuse plaintext origins.	`tls-https`
HTTP security & discovery headers	Security headers plus `Link` headers advertising your AI files.	`http-headers`
Compression & conditional GET	Brotli/gzip and `ETag`/`304` — crawl cost matters when bots dominate traffic.	`crawl-efficiency`
RFC 9116 security.txt	A machine-readable security contact.	`security-txt`
SEO basics	Title, description, canonical, lang, hreflang — agents use the same head-tag fundamentals search engines do.	`seo-basics`

These answer: is the connection trustworthy, cheap, and well-formed?

On the horizon (not yet scored)

Two standards are maturing and worth watching:

Web Bot Auth — cryptographic crawler verification via HTTP Message Signatures (RFC 9421). Bots sign requests with a key published at /.well-known/http-message-signatures-directory; sites verify identity instead of guessing from user-agent strings. Already implemented by Cloudflare and Google (agent.bot.goog). It directly affects the agent-access check: a WAF using Web Bot Auth may pass a real, signed crawler while rejecting ax-audit's unsigned probe — which is why that check's findings carry an explicit verified-bots caveat.
Pay-per-crawl / HTTP 402 — Cloudflare and the RSL payment vocabulary point toward metered, paid agent access. RSL already encodes the terms; enforcement protocols (Open License Protocol, x402) are emerging.

How the families compose

A fully AX-ready site tells a coherent story across all four:

"Here's my content in a form you can read (family 1), here's the interface to interact with me (family 2), here's exactly who may use it and how, for what license (family 3), over a fast and trustworthy connection (family 4)."

ax-audit's weighting reflects today's leverage: discovery and readability (llms-txt, robots-txt, html-rendering, structured-data, http-headers) carry the most weight because they're the highest-impact, most-adopted signals. The governance and efficiency standards are informational in 3.x — real and worth adopting, but still stabilizing — and gain weight in v4.0.