Robots.txt

Weight: 11% of your AX score. This check verifies that your robots.txt explicitly configures access rules for AI crawlers — not just traditional search engines.

/robots.txt not found

Your site does not have a /robots.txt file at the root. Without it, AI crawlers have no explicit instructions on what they can or cannot access. While most crawlers will still index your site, having an explicit robots.txt signals intent and gives you fine-grained control.

Create a robots.txt file in your site root (usually /public/robots.txt in Next.js or the web root directory) with User-agent entries for the AI crawlers you want to allow:

# Allow all AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: CCBot
Allow: /

# Traditional crawlers
User-agent: *
Allow: /

Sitemap: https://your-site.com/sitemap.xml

Quick fix with ax-init

Run npx ax-init --from https://your-site.com to auto-generate a robots.txt with all 29+ AI crawler entries pre-configured.

Missing core AI crawlers

ax-audit detected that your robots.txt exists but is missing explicit User-agent entries for some of the 6 core AI crawlers. The core crawlers are:

GPTBot — OpenAI's crawler for ChatGPT and plugins
ClaudeBot — Anthropic's crawler for Claude
Google-Extended — Google's AI-specific crawler
PerplexityBot — Perplexity AI's search crawler
Bytespider — ByteDance's crawler
CCBot — Common Crawl's crawler (used by many AI training datasets)

Add explicit entries for each missing crawler. Each entry needs its own User-agent line followed by Allow: /:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

No core AI crawlers configured

Your robots.txt exists but has zero explicit entries for any of the 6 core AI crawlers. This means AI agents rely entirely on the wildcard User-agent: * rule, which may or may not permit access.

Add User-agent blocks for all 6 core crawlers listed in the section above. Being explicit about AI crawler access is a best practice — it removes ambiguity and ensures your site is discoverable.

Blocked by wildcard rule

Your robots.txt has a User-agent: * / Disallow: / rule that blocks all crawlers, including AI crawlers that don't have their own explicit entry.

If you want to keep the wildcard block for traditional crawlers but allow AI agents, add explicit entries for each AI crawler before the wildcard rule:

# AI crawlers — explicitly allowed
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# ... other AI crawlers ...

# Block everything else
User-agent: *
Disallow: /

How robots.txt matching works

When a crawler identifies itself as "GPTBot", the server looks for a matching User-agent: GPTBot section first. Only if no specific match exists does it fall back to User-agent: *. So explicit entries always take precedence.

Explicitly blocked crawlers

One or more AI crawlers have explicit Disallow: / rules in your robots.txt. If this is intentional (e.g., you don't want certain AI companies indexing your content), you can ignore this warning.

If you want to allow these crawlers, change their Disallow: / to Allow: /:

# Before (blocked)
User-agent: GPTBot
Disallow: /

# After (allowed)
User-agent: GPTBot
Allow: /

Partial path restrictions

Some AI crawlers have Disallow rules on specific paths rather than a full block. For example:

User-agent: GPTBot
Disallow: /private/
Disallow: /api/internal/

This is often intentional — you may want AI crawlers to access most of your site but not private sections. ax-audit flags this as a warning so you can verify the restrictions match your intent. For maximum AX score, use only Allow: / for AI crawlers and handle path restrictions through the wildcard rule.

Missing Sitemap directive

Your robots.txt does not include a Sitemap: directive. While AI crawlers can discover your sitemap through other means, including it in robots.txt is a widely-supported standard that helps all crawlers find your site map immediately.

Add a Sitemap directive at the end of your robots.txt:

Sitemap: https://your-site.com/sitemap.xml

Use an absolute URL (including https://). If you have multiple sitemaps, add one line per sitemap.

Low AI crawler coverage

ax-audit tracks 29+ known AI crawlers. Your robots.txt has explicit rules for fewer than 10 of them. While the 6 core crawlers are most important, configuring additional crawlers improves discoverability across more AI platforms.

Run npx ax-init to generate a robots.txt that includes all known AI crawler entries. You can also view the full list in the ax-audit source code.

No Content-Signal directive

The Content Signals Policy (CC0, launched by Cloudflare) extends robots.txt with a machine-readable way to say what crawlers may do with your content after they access it — something Allow/Disallow cannot express. Three signals are defined: search (search indexing and excerpts), ai-input (RAG / grounding / live answers), and ai-train (training or fine-tuning models). Omitting a signal is neutral — it neither grants nor restricts.

User-agent: *
Content-Signal: search=yes, ai-train=no
Allow: /

Cloudflare already serves signals on 3.8M+ managed domains, so crawler operators increasingly parse them. Generate your own policy text at contentsignals.org. This finding is informational and does not affect your score.

Caveat

Content signals express preferences, not technical enforcement — non-compliant crawlers can ignore them. In the EU they may carry legal weight as a reservation of rights under Article 4 of Directive 2019/790. Combine them with WAF or bot-management rules if you need actual enforcement.

Malformed or misplaced Content-Signal

The directive value must be comma-delimited signal=yes|no pairs. Values other than yes/no (e.g. search=maybe) or bare names without a value are ignored by crawlers.

Placement matters too: Content-Signal belongs inside a User-agent group, after the User-agent line it applies to. A directive floating before any group has no defined meaning.

# Wrong — outside a group, invalid value
Content-Signal: search=maybe

# Right
User-agent: *
Content-Signal: search=yes, ai-input=no, ai-train=no
Allow: /

Unknown content signal name

Your directive contains a name outside the policy vocabulary. Only three signals are defined: search, ai-input, and ai-train. Anything else (e.g. ai-all=no or a typo like ai_train) is silently ignored by crawlers, which may give you a false sense of protection.

If you need finer-grained licensing semantics than the three signals offer — per-path terms, attribution, or payment — look at Really Simple Licensing (RSL).