Multi-Layer Structured Data Strategy for Semantic Discoverability
Context
The portfolio's Agent Experience (AX) layer established the principle of treating AI agents as first-class consumers (ADR-006). But the implementation of that principle required a deeper architectural decision: at what granularity should structured data be emitted, and through which protocols? A single ai-profile endpoint provides a full data dump, but it requires the consuming agent to know the endpoint exists and to make a deliberate HTTP request. JSON-LD embedded in HTML is passively discoverable by any crawler that renders the page. llms.txt is discoverable by agents that follow the emerging convention. OpenAPI specs are discoverable by agents that look for .well-known manifests. Each channel reaches a different class of consumer with different discovery mechanisms, and no single channel covers the entire spectrum. The question was whether to invest in one high-quality channel or to implement a redundant multi-layer strategy where the same semantic information is expressed through multiple complementary protocols.
Decision
Implement a four-layer structured data architecture, each layer targeting a distinct consumption pattern. Layer 1: Page-level JSON-LD (Schema.org) — embedded in every HTML page, passively crawled by search engines and AI bots that render HTML. Uses 12 schema types (Person, Organization, WebSite, ProfilePage, CollectionPage, ItemList, BreadcrumbList, TechArticle, EducationalOccupationalCredential, Occupation, SiteNavigationElement, WebPage). Each page emits a @graph array with contextually relevant entities. Layer 2: llms.txt / llms-full.txt — static text files at the root, following the llms.txt convention for LLM-native discovery. The short version provides a structured summary; the full version dumps the complete professional profile in markdown with semantic headers. Layer 3: ai-plugin.json + OpenAPI 3.0 — conforming to the OpenAI plugin specification, enabling tool-use capable agents to discover and query the API programmatically. The OpenAPI spec documents all 12 endpoints with request/response schemas. Layer 4: /api/ai-profile — a dynamic endpoint that aggregates data from all API collections into a single JSON payload optimized for LLM context windows, with field-level descriptions and semantic keys. CORS is set to wildcard to allow any agent to consume it without preflight negotiation.
Consequences
Positive: The portfolio is discoverable and consumable by every class of AI agent currently in production — from Googlebot (JSON-LD) to GPTBot (llms.txt) to ChatGPT plugins (ai-plugin.json) to custom AI pipelines (/api/ai-profile). Redundancy means no single point of failure in discoverability: if an agent doesn't know about llms.txt, it can still extract structured data from JSON-LD; if it can't render HTML, it can query the API directly. The four layers create a discovery funnel: broad passive (JSON-LD) to convention-based (llms.txt) to standard-based (OpenAPI) to direct consumption (ai-profile). Each layer caches independently, so a failure in the dynamic endpoint doesn't affect the static discovery files. Negative: Four layers of the same underlying data create a maintenance multiplier. Adding a new content type (e.g., ADRs) requires updating: JSON-LD schemas in the new pages, the llms.txt files, the OpenAPI spec, and the ai-profile endpoint. This synchronization burden is the primary cost. The OpenAPI spec is ~8KB and the ai-plugin.json is ~1KB — trivial in size but non-trivial in maintenance. The llms.txt standard is not yet RFC-status, so the format may change. The mitigation: the static files (llms.txt, OpenAPI) are updated infrequently and version-controlled; the dynamic endpoint (ai-profile) fetches live data and stays in sync automatically; the JSON-LD is co-located with the rendering code so it evolves atomically with the UI.
Predictions at Decision Time
Predicted the four-layer approach would provide near-complete coverage of the AI agent ecosystem. Expected the maintenance multiplier to be manageable because content changes are infrequent and three of the four layers (JSON-LD, llms.txt, OpenAPI) only need updates when the data schema changes, not when content values change. Predicted the ai-profile endpoint would become the most-consumed layer due to its single-request completeness.
Measured Outcomes
Too early for definitive measurement on layer consumption patterns. The maintenance multiplier has been as predicted — adding ADRs required JSON-LD updates in the new pages but did not require llms.txt, OpenAPI, or ai-profile updates (since ADRs are stored locally, not in the API). This suggests the four-layer architecture naturally segments: API-backed content updates propagate automatically through Layer 4, while frontend-only content (ADRs) only requires Layer 1 updates. The unexpected learning: the layers serve different temporal patterns — JSON-LD is always current (co-located with rendering), ai-profile is always current (dynamic API calls), but llms.txt and OpenAPI are point-in-time snapshots that can drift.
Unknowns at Decision Time
The primary unknown remains: which layer do AI agents actually prefer? Server logs show /api/ai-profile requests and llms.txt fetches, but there's no way to measure JSON-LD consumption (it's embedded in HTML, indistinguishable from regular page crawls). Also unknown: whether the four-layer architecture is overkill for a personal portfolio, or whether it will become table stakes as AI agent consumption grows. The broader industry trajectory will answer this within 12 months.
Reversibility Classification
Each layer is independently removable. Removing Layer 1 (JSON-LD) requires editing page components. Removing Layer 2 (llms.txt) requires deleting two files. Removing Layer 3 (OpenAPI + ai-plugin.json) requires deleting two files. Removing Layer 4 (ai-profile) requires deleting one API route. None of the layers depend on each other — the redundancy is intentional, not structural. Estimated effort to remove any single layer: 30-60 minutes.
Strongest Counter-Argument
The four-layer approach violates DRY (Don't Repeat Yourself) at the architectural level. The same semantic data is expressed four different ways, creating four synchronization surfaces. A simpler approach: invest deeply in one layer (JSON-LD, because it's passively discoverable and embedded in the pages) and let AI agents that need more data scrape the rendered HTML. Most successful professional portfolios have zero AX-specific infrastructure and still appear in AI-generated summaries. The counter-counter: the portfolio is explicitly an AX showcase — demonstrating multi-layer structured data strategy is itself a portfolio piece, separate from its functional value.
Technical Context
- Four layers must stay semantically consistent
- llms.txt standard not yet formalized
- OpenAPI spec manually maintained
- ai-profile endpoint has runtime dependency on API