Infrastructure

Service Mesh Evolution: Istio Ambient Mesh, Linkerd, and Sidecar-Less Architectures

March 20269 min read

Lucio Durán

Engineering Manager & AI Solutions Architect

Why Sidecars Were Always a Compromise

The sidecar pattern — injecting an Envoy proxy container into every pod — solved the "how do we intercept traffic" problem elegantly. But it created a new set of problems that compounded at scale:

Resource overhead. Every pod gets an Envoy container consuming 50-100MB of memory (more with large route tables) and a meaningful slice of CPU. At 500 pods, that's 25-50GB of memory just for proxies.

Lifecycle coupling. The sidecar and the application container share a pod. If Envoy crashes, the pod restarts. If the application needs to drain connections, Envoy needs to know. Race conditions during startup and shutdown are a constant source of bugs.

Injection complexity. Mutating webhook that modifies pod specs at admission time. Every deployment, every StatefulSet, every Job gets modified. Init containers for iptables rules. Compatibility issues with every Kubernetes upgrade.

I spent a memorable weekend debugging why a batch Job was hanging after completion. The application container finished and exited 0, but the Envoy sidecar kept the pod alive because it didn't receive a SIGTERM through the sidecar lifecycle hook. The Job controller saw the pod as still running.

Istio Ambient Mesh Architecture

Ambient mesh splits the data plane into two layers:

Layer 4: ztunnel (Per-Node DaemonSet)

ztunnel is a purpose-built L4 proxy written in Rust. One instance per node, running as a DaemonSet with host networking. It handles:

mTLS termination and origination between all pods on the node
TCP-level traffic routing based on service VIPs
L4 authorization policies (allow/deny based on source identity)
Telemetry collection (TCP metrics, connection tracking)

The traffic interception mechanism:

Pod A (10.0.1.5) wants to reach Service B (10.96.0.42:8080)

1. Pod A's outbound traffic hits iptables rules configured by
 Istio CNI plugin (or eBPF on supported kernels)

2. Traffic is redirected to ztunnel on the same node
 (via a TPROXY rule or eBPF redirect)

3. ztunnel resolves Service B to an endpoint pod (10.0.2.8)

4. ztunnel establishes an HBONE tunnel (HTTP/2 CONNECT)
 to the ztunnel on the destination node
 - TLS 1.3 with mTLS using SPIFFE identities
 - Both sides verify workload certificates

5. Destination ztunnel decapsulates and forwards to Pod B

┌─── Node 1 ───────────────────┐ ┌─── Node 2 ───────────────────┐
│ │ │ │
│ ┌─────────┐ │ │ ┌─────────┐ │
│ │ Pod A │──────┐ │ │ ┌──────│ Pod B │ │
│ └─────────┘ │ │ │ │ └─────────┘ │
│ ▼ │ │ ▲ │
│ ┌─────────────────────────┐ │ │ ┌─────────────────────────┐ │
│ │ ztunnel (DaemonSet) │◄─────────► ztunnel (DaemonSet) │ │
│ │ L4: mTLS + TCP routing │ │ │ │ L4: mTLS + TCP routing │ │
│ └─────────────────────────┘ │ │ └─────────────────────────┘ │
│ │ │ │
└───────────────────────────────┘ └───────────────────────────────┘
 HBONE tunnel (HTTP/2 + mTLS)

Layer 7: Waypoint Proxies (Per-Namespace/Workload)

For HTTP-level traffic management — header-based routing, retries, fault injection, L7 authorization policies — you deploy waypoint proxies:

# Deploy a waypoint proxy for a namespace
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: waypoint
 namespace: my-app
 labels:
 istio.io/waypoint-for: service
spec:
 gatewayClassName: istio-waypoint
 listeners:
 - name: mesh
 port: 15008
 protocol: HBONE

Waypoint proxies are full Envoy instances, but they're deployed only where needed — per namespace, per service, or per workload. The traffic flow with a waypoint:

Pod A → ztunnel (Node 1)
 → HBONE tunnel → waypoint proxy (L7 processing)
 → HBONE tunnel → ztunnel (Node 2) → Pod B

This is the elegant part: services that only need mTLS and L4 policies pay zero L7 overhead. You opt into L7 processing granularly.

Measured Latency Overhead

I benchmarked all three configurations on the same EKS cluster (m6i.xlarge nodes, us-east-1):

Test Setup

# Fortio load generator, 100 concurrent connections, 60 second runs
# Direct pod-to-pod across nodes (cross-AZ for realistic latency)
fortio load -c 100 -t 60s -qps 0 http://echo-server:8080/

Results: Cross-Node HTTP/1.1 Latency (microseconds)

Configuration	p50	p75	p90	p99	p99.9
No mesh (baseline)	245	312	425	890	2,100
Istio sidecar (Envoy)	612	785	1,050	2,340	5,800
Istio ambient (ztunnel only, L4)	298	378	510	1,050	2,600
Istio ambient (ztunnel + waypoint)	580	745	990	2,180	5,200
Linkerd (linkerd2-proxy)	395	498	665	1,420	3,400

Key observations:

ztunnel L4-only adds ~53 microseconds at p50. This is the mTLS overhead plus iptables redirect plus HBONE encapsulation. Remarkably low for what it's doing — full mutual TLS with SPIFFE identity verification on every connection.

Ambient with waypoint is comparable to sidecar mode. When you need L7 processing, you're going through an Envoy anyway. The ambient architecture doesn't magically make L7 cheaper; it just lets you avoid it when you don't need it.

Linkerd sits in between. Its Rust-based proxy is faster than Envoy for simple proxying, but it's still a sidecar — every pod pays the cost. The total cluster-wide resource cost is lower than Istio sidecars but higher than ambient L4-only.

Memory Overhead (Per Pod / Per Node)

Configuration	Memory Per Pod	Total for 500 Pods
Istio sidecar	~70MB	~35GB
Istio ambient (ztunnel)	0 (per-node: ~120MB)	~600MB (5 nodes)
Istio ambient (+ waypoint)	0 (+ ~200MB per waypoint)	~1.6GB
Linkerd	~25MB	~12.5GB

The resource story is where ambient mesh really wins. Going from 35GB of proxy memory across the cluster to 600MB is a massive reduction. That's real money on your cloud bill.

mTLS Without Sidecars: Protocol-Level Details

The HBONE (HTTP-Based Overlay Network Encapsulation) tunnel is the transport mechanism between ztunnels. Here's what happens at the protocol level:

1. ztunnel on Node 1 opens a TCP connection to ztunnel on Node 2 (port 15008)

2. TLS 1.3 handshake with mutual authentication:
 - Client cert: spiffe://cluster.local/ns/my-app/sa/pod-a-sa
 - Server cert: spiffe://cluster.local/ns/my-app/sa/pod-b-sa
 - Both certs issued by Istio's CA (istiod)

3. Over the mTLS connection, HTTP/2 CONNECT request:
 CONNECT 10.0.2.8:8080 HTTP/2
 Host: 10.0.2.8:8080

4. Server ztunnel validates authorization policies against the
 authenticated source identity

5. If allowed, opens a local connection to the destination pod
 and proxies bytes bidirectionally

6. Connection multiplexed over HTTP/2 — multiple pod-to-pod
 connections share a single ztunnel-to-ztunnel TCP connection

The SPIFFE identity extraction is automatic — ztunnel gets the pod's service account from the Kubernetes API and maps it to a SPIFFE URI. No application changes needed. Your application still sees plain HTTP; it has no idea mTLS is happening at the infrastructure layer.

Linkerd: The Underappreciated Alternative

Linkerd merits consideration as a simpler alternative. It does less than Istio, and that is its strength. The data plane proxy (linkerd2-proxy) is written in Rust, purpose-built for the service mesh use case, and is remarkably efficient:

# Linkerd install — compare this to Istio's installation
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -

# Mesh a namespace
kubectl annotate namespace my-app linkerd.io/inject=enabled

# That's it. Restart your pods and they're meshed.

Linkerd's mTLS is on by default. No configuration. No Gateway resources. No waypoint proxies. Every meshed pod gets mTLS automatically, with certificates rotated every 24 hours via the identity controller.

# Linkerd authorization policy — simple and readable
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
 name: api-server
 namespace: my-app
spec:
 podSelector:
 matchLabels:
 app: api
 port: 8080
 proxyProtocol: HTTP/2
---
apiVersion: policy.linkerd.io/v1beta3
kind: AuthorizationPolicy
metadata:
 name: allow-frontend-only
 namespace: my-app
spec:
 targetRef:
 group: policy.linkerd.io
 kind: Server
 name: api-server
 requiredAuthenticationRefs:
 - name: frontend-sa
 kind: ServiceAccount

The trade-off: Linkerd doesn't have Istio's VirtualService/DestinationRule complexity. No traffic mirroring, no fault injection beyond what you build into your app, limited multi-cluster story. For 80% of teams, that's fine.

Migration Strategy: Sidecar to Ambient

The following is the migration playbook used for a 200-service production cluster transitioning from Istio sidecar to ambient:

# Phase 1: Install ambient components alongside existing sidecars
istioctl install --set profile=ambient

# Phase 2: Migrate one low-risk namespace
kubectl label namespace canary-app istio.io/dataplane-mode=ambient

# Remove sidecar injection for that namespace
kubectl label namespace canary-app istio-injection-

# Restart pods to remove sidecars
kubectl rollout restart deployment -n canary-app

# Phase 3: Verify traffic flows
# Check ztunnel logs for connection errors
kubectl logs -n istio-system -l app=ztunnel --tail=100

# Verify mTLS is working
istioctl x describe pod <pod-name> -n canary-app

The critical gotcha: authorization policies that reference app labels on source pods don't work in ambient mode. ztunnel identifies sources by SPIFFE identity (service account), not pod labels. If your policies use source.principal or source.namespaces, you're fine. If they use source.labels, you need to rewrite them before migrating.

# BEFORE (sidecar mode — uses pod labels, won't work in ambient)
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
spec:
 rules:
 - from:
 - source:
 matchLabels:
 app: frontend # ← This breaks in ambient

# AFTER (ambient compatible — uses service account identity)
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
spec:
 rules:
 - from:
 - source:
 principals:
 - "cluster.local/ns/my-app/sa/frontend-sa"

When to Use What

After running all three in production at different companies:

Istio ambient mesh: Large clusters (200+ services), teams that need granular L7 policies, multi-cluster requirements, organizations already invested in Istio's ecosystem. The sidecar-less L4 layer finally makes the resource overhead acceptable.

Istio sidecar mode: Legacy workloads that need per-pod Envoy customization (custom Wasm filters, specific Envoy features). This is maintenance mode — new deployments should use ambient.

Linkerd: Small to medium clusters (under 200 services), teams that value operational simplicity, environments where every megabyte of memory matters (edge, IoT), organizations without dedicated platform teams.

No mesh at all: If your only requirement is mTLS between services, consider mTLS at the application level with a cert manager. A service mesh adds operational complexity that's only justified when you need traffic management, observability, or policy enforcement beyond what your application framework provides.

The mesh wars are mostly over. Istio won on features, Linkerd won on simplicity, and ambient mesh is the evolutionary step that makes sidecars obsolete for most workloads. Pick the one that matches your team's operational capacity, not the one with the most impressive feature matrix.

Tips

LuchoTip: When migrating from Istio sidecar to ambient, do it namespace by namespace with istio.io/dataplane-mode=ambient label. Don't try to flip the whole cluster at once — the ztunnel DaemonSet and waypoint proxies need to be stable before you remove sidecar injection. I learned this by taking down a staging cluster's entire east-west traffic for 4 minutes.

LuchoTip: Monitor ztunnel_tcp_connections_opened_total and ztunnel_tcp_connections_closed_total Prometheus metrics right after enabling ambient mode. A growing gap between opened and closed connections means ztunnel is leaking connections — usually a sign of misconfigured HBONE tunnel keepalives.

LuchoTip: If you're running Linkerd, don't bother switching to Istio ambient just for the sidecar-less architecture. Linkerd's sidecar proxy (linkerd2-proxy, written in Rust) adds ~2MB memory and <0.5ms p99 latency per hop. The operational simplicity of Linkerd vs Istio is worth more than saving those resources.

service-meshistiolinkerdambient-meshztunnelkubernetesmtlssidecar

Tools mentioned in this article

AWSTry AWS

DigitalOceanTry DigitalOcean

Disclosure: Some links in this article are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you. I only recommend tools I personally use and trust.

X / Twitter LinkedIn WhatsApp

Seguime

Back to Blog