Skip to main content
ADR-020accepted

Calibrated Uncertainty as Architectural Metadata

Context

Architecture Decision Records traditionally document three things: the context (why a decision was needed), the decision (what was chosen), and the consequences (what trade-offs resulted). This format captures the reasoning at a single point in time but omits two critical dimensions of engineering judgment. First: the temporal dimension — what did the decision-maker predict would happen versus what actually happened? Engineering maturity is not demonstrated by making correct decisions, but by accurately calibrating confidence levels and being honest about prediction accuracy. Second: the epistemic dimension — what was explicitly unknown at decision time, and how did the decision-maker reason under that uncertainty? The difference between a junior and senior architect is not the quality of their decisions but the quality of their uncertainty models. A decision record that only shows the 'right answer' hides the reasoning process that produced it. The portfolio's ADR system (ADR-008) had 12 records documenting technically sound decisions. But they read like finished products rather than reasoning artifacts — they showed what was decided, not how the decision-maker thinks.

Decision

Extend every ADR with a calibration object containing five fields that expose the decision-maker's reasoning process. predictions: explicit statements of what was expected at decision time, including quantitative targets where applicable. outcomes: measured results compared to predictions — what matched, what diverged, what was unexpected. unknowns: explicitly stated blind spots at decision time — the things the decision-maker knew they didn't know, and the things they didn't know they didn't know (discovered retrospectively). reversibility: classification as one-way-door (irreversible or expensive to reverse) or two-way-door (easily reversible), with estimated reversal effort — following Amazon's decision-making framework that treats reversibility as a first-class property of architectural decisions. counterArguments: the strongest case for the path not taken — a steel-man of the rejected alternative, not a straw-man. This transforms the ADR system from a documentation tool to a calibration instrument that demonstrates epistemic humility and prediction accuracy.

Consequences

Positive: Each ADR now functions as a micro case study in decision-making under uncertainty — the format reveals whether the architect tends to over-predict, under-predict, or accurately calibrate outcomes. The unknowns field demonstrates intellectual honesty: explicitly stating 'I didn't know X' is a more powerful signal of maturity than only showing what was known. The reversibility classification provides immediate actionability: one-way-door decisions deserve more deliberation than two-way-door decisions. The counter-arguments section prevents echo-chamber reasoning — every decision is stress-tested against the best alternative. For a hiring audience (particularly Anthropic, where calibrated uncertainty is a core research concept), this format signals deep alignment with empirical reasoning practices. Negative: The calibration fields significantly increase the per-record data size — from ~1KB to ~3KB per ADR, tripling the JSON file size. Writing honest calibration requires retrospective analysis that can take 30-60 minutes per ADR. The outcomes field creates an ongoing maintenance burden: as outcomes evolve, the calibration should be updated. The format may feel self-indulgent — a portfolio that extensively analyzes its own architectural decisions risks appearing navel-gazing rather than substantive. The target audience (engineering leaders) must value the meta-cognitive demonstration over the content itself.

Calibrated Uncertainty

Predictions at Decision Time

Predicted the calibration fields would differentiate the portfolio from any other developer portfolio. Expected the writing effort to be the primary cost — estimated 30 minutes per ADR for initial calibration. Assumed the 3x data size increase would have no performance impact. Predicted hiring managers familiar with decision theory (particularly at Anthropic, Google DeepMind, or similar organizations) would immediately recognize the framework as a signal of intellectual rigor.

Measured Outcomes

This ADR was written on the same day the calibration system was implemented — outcomes are by definition unmeasured. The writing effort was approximately 20 minutes per ADR, under the 30-minute estimate, because the technical context was fresh in memory. The data size increase from 12KB to 35KB has zero measurable performance impact (build time unchanged, page load unchanged). Whether the calibration format differentiates the portfolio in hiring contexts is the fundamental unknown that will only be answered through actual engagement with the target audience.

Unknowns at Decision Time

The primary unknown: whether the target audience values calibrated uncertainty as a signal, or whether it reads as overthinking. The framework is deeply aligned with Anthropic's research culture (calibration, epistemic humility, honest uncertainty), but it may not resonate with engineering leaders at more execution-focused organizations. Also unknown: whether the self-referential nature of this ADR (an ADR about ADR calibration) enhances or undermines the portfolio's credibility. Self-reference can signal meta-cognitive sophistication or academic self-indulgence — the distinction is in the reader's frame. The broadest unknown: whether architecture decision records will become a standard portfolio feature in the industry, which would reduce this portfolio's differentiation, or remain niche, which would maintain its uniqueness.

Reversibility Classification

Two-Way Door

The calibration UI sections are conditionally rendered — they only appear if the calibration object exists in the ADR data. Removing calibration from all ADRs requires deleting the calibration key from each entry in the JSON file. The UI sections automatically disappear. No code changes needed in the rendering components. Estimated removal effort: 15 minutes for data cleanup. The reverse direction (adding calibration to a non-calibrated ADR) requires writing the content — the code already supports it.

Strongest Counter-Argument

The calibration system is a meta-optimization: time spent calibrating architectural decisions is time not spent making new ones. A portfolio should demonstrate building capability, not analyzing capability. The calibration fields may signal analysis paralysis rather than execution speed — a VP of Engineering hiring for a startup CTO role values shipping velocity over epistemic precision. Additionally, the calibration format assumes the reader has the patience and interest to read 3KB of reasoning per decision. Most visitors will read the title and first paragraph; the calibration section is effectively invisible to cursory readers. The counter-counter: the calibration section is not for cursory readers. It's for the specific audience that evaluates architectural judgment at depth — the same audience that reads research papers, reviews system design documents, and values reasoning over conclusions. If that audience exists for this portfolio, the calibration is the most valuable section. If it doesn't, the portfolio has a targeting problem that no amount of content can solve.

Technical Context

Stack
JSON Schema Extensionstyled-componentsLucide React Icons
Calibration Fields Per Adr
5
Data File Size Increase
~3x (12KB to 35KB)
Ui Sections Added
5
New Styled Components
4
One Way Door Decisions
2
Two Way Door Decisions
18
Constraints
  • Calibration must be honest, not retroactively rationalized
  • Outcomes must reflect actual measurements, not aspirational claims
  • Counter-arguments must be steel-man, not straw-man

Related Decisions