Skip to main content
ADR-014accepted

Defensive Data Fetching with Graceful Degradation

Context

The portfolio's SSG build process (ADR-001) fetches data from the API (api.lucioduran.com) during build time. If the API is unavailable, slow, or returns malformed data during a build, the entire site fails to deploy — all pages, not just the affected ones. This is the fundamental fragility of build-time data fetching: a single API failure cascades to a complete build failure. The API serves 12 endpoints, and each page's getStaticProps may fetch from one or more of them. A timeout on /technologies blocks the technology pages, but it also blocks the build pipeline, preventing all other pages from deploying. The question is not whether the API will fail — it's how the build should behave when it does.

Decision

Implement a defensive data-fetching layer with three mechanisms. First: per-endpoint try/catch with fallback to empty arrays. Each getStaticProps wraps its API fetch in error handling that returns sensible defaults (empty collections, placeholder objects) rather than throwing. A page with no data renders a 'content unavailable' state rather than crashing the build. Second: request timeouts at 10 seconds per endpoint — if the API doesn't respond within 10 seconds, the fetch resolves with the fallback rather than hanging the build indefinitely. Third: independent page builds — each page's getStaticProps is self-contained, fetching only the data it needs. A failure in /technologies does not affect /education because they fetch from different endpoints in isolated execution contexts. The homepage, which aggregates data from multiple endpoints, uses Promise.allSettled rather than Promise.all, processing whatever data is available.

Consequences

Positive: Build resilience improved from 'any API failure kills the entire build' to 'API failures degrade individual pages without blocking deployment.' A build during an API outage produces a functional site with some pages showing reduced content rather than a failed deployment serving stale content. The 10-second timeout prevents indefinite build hangs that would block the Vercel build queue. Promise.allSettled on the homepage ensures that a slow /posts endpoint doesn't block /about data from rendering. Build logs clearly indicate which endpoints failed, enabling targeted debugging. Negative: Silent degradation can mask persistent API issues — a page rendering with empty data might not be noticed immediately if monitoring only checks build success/failure. The fallback states (empty arrays, placeholder content) are technically correct but informationally useless — a visitor sees a working page with no content, which may be worse than an error page that communicates the issue honestly. The 10-second timeout is arbitrary and may be too aggressive for cold-start scenarios or too lenient for a fast API.

Calibrated Uncertainty

Predictions at Decision Time

Expected the defensive fetching to prevent build failures during API outages. Predicted 1-2 API outages per year based on historical uptime. Assumed the 10-second timeout would be appropriate for all endpoints. Predicted the fallback states would never be visible in production because the API would always be available during builds.

Measured Outcomes

Zero build failures from API issues in 20+ months — exactly as designed. The defensive layer has triggered twice: once during a planned server maintenance window (builds continued deploying with cached data from a previous successful build's Vercel cache), and once during a brief MongoDB connection timeout. Both times, the affected pages showed empty content sections rather than crashing. The 10-second timeout has never triggered under normal conditions — API response times are consistently under 500ms. The prediction that fallback states would never be visible was wrong: the two degradation events did produce pages with missing content in production for approximately 15 minutes each time.

Unknowns at Decision Time

Did not anticipate that Vercel's build cache would serve as an implicit fallback layer — when a build fails or is cancelled, Vercel continues serving the previous successful build. This means the defensive fetching layer is actually the second line of defense, not the first. Also unknown: whether the 10-second timeout would interact poorly with Vercel's own build timeout (which is 45 seconds for the free tier). In theory, 12 sequential 10-second timeouts could consume 120 seconds, exceeding the build timeout — but endpoints are fetched in parallel, so the effective timeout is 10 seconds total.

Reversibility Classification

Two-Way Door

Removing the defensive layer means removing try/catch blocks and timeout logic from getStaticProps functions — reverting to raw fetch calls. This is a mechanical change per page. The fallback states are just conditional renders that can be removed. Estimated effort: 2-3 hours for all pages. The reverse direction (adding defenses) is equally simple. There is no scenario where removing defenses is desirable.

Strongest Counter-Argument

The defensive layer adds complexity to every data-fetching function. A simpler approach: let builds fail on API errors, and rely on Vercel's build cache to serve the previous successful deploy. This achieves the same user-facing outcome (site stays up) without the code complexity of fallback states, timeouts, and allSettled patterns. The build failure would surface in Vercel's dashboard as a clear alert, rather than silently degrading content. The counter-counter: relying on Vercel's cache means the site can never be deployed during an API outage, even if only API-independent pages (like the ADR section) have changed. The defensive layer preserves deploy capability during partial outages.

Technical Context

Stack
Next.js getStaticPropsPromise.allSettledAbortControllertry/catch
Fetch Timeout
10 seconds
Endpoints Fetched
12
Build Failures From Api
0 (in 20+ months)
Graceful Degradations
2 observed
Constraints
  • Each page fetches independently
  • Fallback state must be visually acceptable
  • Build logs must indicate which endpoints failed

Related Decisions