Networking

eBPF and XDP: High-Performance Packet Processing and DDoS Mitigation at Line Rate

February 20269 min read

Lucio Durán

Engineering Manager & AI Solutions Architect

Why XDP Is Different From Everything Else

The Linux networking stack processes packets through a long chain: NIC hardware → NIC driver → sk_buff allocation → netfilter/iptables → socket receive buffer → application. Each step copies data, allocates memory, and burns CPU.

XDP (eXpress Data Path) hooks into the NIC driver, before sk_buff allocation. When a packet arrives, the NIC driver calls your XDP program with a raw pointer to the packet data. Your program returns one of five actions:

XDP_PASS // Continue normal processing (send to kernel stack)
XDP_DROP // Drop the packet immediately (no allocation, no copy)
XDP_TX // Bounce the packet back out the same NIC
XDP_REDIRECT // Send to another NIC, CPU, or AF_XDP socket
XDP_ABORTED // Error occurred, drop and log

The critical path for DDoS mitigation is XDP_DROP. The packet never gets an sk_buff, never touches iptables, never reaches the network stack. The NIC driver just moves on to the next packet. On a modern NIC, this happens at line rate.

Writing Your First XDP Program

Here's a minimal XDP program that drops all UDP traffic on port 53 (DNS amplification attacks):

// xdp_filter.c
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <bpf/bpf_helpers.h>

SEC("xdp")
int xdp_dns_filter(struct xdp_md *ctx)
{
 void *data = (void *)(long)ctx->data;
 void *data_end = (void *)(long)ctx->data_end;

 // Parse Ethernet header
 struct ethhdr *eth = data;
 if ((void *)(eth + 1) > data_end)
 return XDP_PASS;

 if (eth->h_proto != __constant_htons(ETH_P_IP))
 return XDP_PASS;

 // Parse IP header
 struct iphdr *ip = (void *)(eth + 1);
 if ((void *)(ip + 1) > data_end)
 return XDP_PASS;

 if (ip->protocol != IPPROTO_UDP)
 return XDP_PASS;

 // Parse UDP header
 struct udphdr *udp = (void *)ip + (ip->ihl * 4);
 if ((void *)(udp + 1) > data_end)
 return XDP_PASS;

 // Drop DNS responses (source port 53) — typical amplification attack
 if (udp->source == __constant_htons(53))
 return XDP_DROP;

 return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

Every bounds check is mandatory. The BPF verifier will reject your program if you access ip->protocol without first proving that (void *)(ip + 1) <= data_end. This is annoying the first time and reassuring every time after — the verifier is the reason BPF programs can't crash the kernel.

Compile and load:

# Compile to BPF bytecode
clang -O2 -target bpf -c xdp_filter.c -o xdp_filter.o

# Attach to interface (driver mode for performance)
ip link set dev eth0 xdpdrv obj xdp_filter.o sec xdp

# Verify it's loaded
ip link show eth0
# ... xdp/id:42 ...

# Detach when done
ip link set dev eth0 xdp off

Real DDoS Mitigation: Rate Limiting With BPF Maps

The DNS filter above is too crude. Real DDoS mitigation needs rate limiting — allow some traffic from each source, but drop when a source exceeds a threshold. BPF maps provide shared state between the kernel-side XDP program and userspace:

// Rate limiter with per-source-IP counters
struct rate_limit_entry {
 __u64 packet_count;
 __u64 last_reset_ns;
};

struct {
 __uint(type, BPF_MAP_TYPE_LRU_HASH);
 __uint(max_entries, 1000000); // Track up to 1M source IPs
 __type(key, __u32); // Source IP (IPv4)
 __type(value, struct rate_limit_entry);
} rate_limits SEC(".maps");

// Configuration map — userspace can update thresholds without reloading
struct {
 __uint(type, BPF_MAP_TYPE_ARRAY);
 __uint(max_entries, 1);
 __type(key, __u32);
 __type(value, __u64); // Max packets per second per IP
} config SEC(".maps");

SEC("xdp")
int xdp_rate_limiter(struct xdp_md *ctx)
{
 void *data = (void *)(long)ctx->data;
 void *data_end = (void *)(long)ctx->data_end;

 struct ethhdr *eth = data;
 if ((void *)(eth + 1) > data_end)
 return XDP_PASS;

 if (eth->h_proto != __constant_htons(ETH_P_IP))
 return XDP_PASS;

 struct iphdr *ip = (void *)(eth + 1);
 if ((void *)(ip + 1) > data_end)
 return XDP_PASS;

 __u32 src_ip = ip->saddr;

 // Get rate limit threshold from config
 __u32 config_key = 0;
 __u64 *max_pps = bpf_map_lookup_elem(&config, &config_key);
 __u64 threshold = max_pps ? *max_pps : 10000; // Default: 10k pps

 // Get or create rate limit entry for this source IP
 struct rate_limit_entry *entry = bpf_map_lookup_elem(&rate_limits, &src_ip);

 __u64 now = bpf_ktime_get_ns();

 if (entry) {
 // Reset counter every second
 if (now - entry->last_reset_ns > 1000000000ULL) {
 entry->packet_count = 1;
 entry->last_reset_ns = now;
 return XDP_PASS;
 }

 entry->packet_count++;

 if (entry->packet_count > threshold) {
 return XDP_DROP;
 }
 } else {
 // First packet from this IP — create entry
 struct rate_limit_entry new_entry = {
 .packet_count = 1,
 .last_reset_ns = now,
 };
 bpf_map_update_elem(&rate_limits, &src_ip, &new_entry, BPF_ANY);
 }

 return XDP_PASS;
}

The BPF_MAP_TYPE_LRU_HASH is key here. During a DDoS, you'll see millions of unique source IPs (spoofed). A regular hash map would fill up and reject new entries, letting new attack IPs through untracked. The LRU map evicts the oldest entries, which are typically attack IPs that have already been rate-limited.

AF_XDP: When You Need Userspace Processing

Sometimes you need to do more than just drop or pass. AF_XDP (Address Family XDP) lets you redirect packets to a userspace application through a zero-copy ring buffer, bypassing the entire kernel network stack:

// XDP program that redirects specific traffic to userspace via AF_XDP
struct {
 __uint(type, BPF_MAP_TYPE_XSKMAP);
 __uint(max_entries, 64); // One per RX queue
 __type(key, __u32);
 __type(value, __u32);
} xsks_map SEC(".maps");

SEC("xdp")
int xdp_af_xdp_redirect(struct xdp_md *ctx)
{
 void *data = (void *)(long)ctx->data;
 void *data_end = (void *)(long)ctx->data_end;

 struct ethhdr *eth = data;
 if ((void *)(eth + 1) > data_end)
 return XDP_PASS;

 if (eth->h_proto != __constant_htons(ETH_P_IP))
 return XDP_PASS;

 struct iphdr *ip = (void *)(eth + 1);
 if ((void *)(ip + 1) > data_end)
 return XDP_PASS;

 // Redirect TCP traffic on port 8080 to userspace
 if (ip->protocol == IPPROTO_TCP) {
 struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
 if ((void *)(tcp + 1) > data_end)
 return XDP_PASS;

 if (tcp->dest == __constant_htons(8080)) {
 return bpf_redirect_map(&xsks_map, ctx->rx_queue_index, XDP_PASS);
 }
 }

 return XDP_PASS;
}

The userspace side uses libxdp or raw AF_XDP sockets:

// Simplified AF_XDP userspace receiver
#include <xdp/xsk.h>
#include <xdp/libxdp.h>

void process_packets(struct xsk_socket *xsk) {
 struct xsk_ring_cons *rx = &xsk->rx;
 struct xsk_ring_prod *fq = &xsk->umem->fq;

 unsigned int received = xsk_ring_cons__peek(rx, BATCH_SIZE, &idx);
 if (!received)
 return;

 for (int i = 0; i < received; i++) {
 const struct xdp_desc *desc = xsk_ring_cons__rx_desc(rx, idx + i);
 uint8_t *packet = xsk_umem__get_data(xsk->umem->buffer, desc->addr);
 uint32_t len = desc->len;

 // Process the packet in userspace — full packet data, zero copy
 handle_packet(packet, len);
 }

 xsk_ring_cons__release(rx, received);

 // Return frames to the fill queue for reuse
 xsk_ring_prod__submit(fq, received);
}

AF_XDP gives you 8-12 million packets per second per core in userspace. That's about 10x faster than recvmsg() on a raw socket, because there's no syscall overhead, no sk_buff allocation, and no data copying — the packet data stays in the same memory region shared between the NIC and your application.

Inside Cilium's Dataplane

Cilium is the most sophisticated production use of BPF for networking. It replaces kube-proxy entirely and implements Kubernetes networking (services, network policies, load balancing) in BPF.

The architecture (simplified):

Pod A → veth → tc/BPF (egress) → tc/BPF (ingress) → veth → Pod B
 ↓ ↓
 [Service LB map] [Network Policy map]
 [CT table map] [Identity map]
 [NAT map] [Endpoint map]

Cilium's key BPF maps that power the whole thing:

// Connection tracking — every established connection
// Key: 5-tuple (src_ip, dst_ip, src_port, dst_port, proto)
struct {
 __uint(type, BPF_MAP_TYPE_HASH);
 __uint(max_entries, 1000000);
 __type(key, struct ct_key);
 __type(value, struct ct_entry);
} CT_MAP SEC(".maps");

// Service map — Kubernetes service ClusterIP → backend pods
struct {
 __uint(type, BPF_MAP_TYPE_HASH);
 __uint(max_entries, 65536);
 __type(key, struct service_key); // service IP + port
 __type(value, struct service_value); // backend pod IPs
} SERVICE_MAP SEC(".maps");

// Network policy — allow/deny rules per endpoint identity
struct {
 __uint(type, BPF_MAP_TYPE_HASH);
 __uint(max_entries, 65536);
 __type(key, struct policy_key); // identity + port + proto
 __type(value, struct policy_entry);
} POLICY_MAP SEC(".maps");

When a packet leaves Pod A destined for a Kubernetes service, Cilium's BPF program:

Looks up the destination in SERVICE_MAP to find a backend pod
Performs DNAT (rewrite destination IP to the backend pod's IP)
Creates a connection tracking entry in CT_MAP
Checks POLICY_MAP to verify the network policy allows this connection
Forwards the packet directly to the backend pod's veth interface

All of this happens in the kernel, with no context switches to userspace. Cilium benchmarks show 2-3x better latency and throughput compared to iptables-based kube-proxy, especially under high connection rates.

Debugging BPF Programs

Debugging is the weak link. printf doesn't exist in BPF. You have bpf_trace_printk(), which writes to the kernel trace pipe:

// In your BPF program
bpf_trace_printk("src_ip=%x pkt_count=%llu\\n", src_ip, entry->packet_count);

# In a terminal
cat /sys/kernel/debug/tracing/trace_pipe

For production, use BPF ring buffers to send structured events to userspace:

struct event {
 __u32 src_ip;
 __u32 dst_ip;
 __u16 src_port;
 __u16 dst_port;
 __u32 action; // XDP_PASS, XDP_DROP, etc.
};

struct {
 __uint(type, BPF_MAP_TYPE_RINGBUF);
 __uint(max_entries, 256 * 1024);
} events SEC(".maps");

// In the XDP program, after making a decision:
struct event *evt = bpf_ringbuf_reserve(&events, sizeof(struct event), 0);
if (evt) {
 evt->src_ip = ip->saddr;
 evt->dst_ip = ip->daddr;
 evt->action = XDP_DROP;
 bpf_ringbuf_submit(evt, 0);
}

Production Numbers

After deploying the XDP rate limiter on our edge servers (DigitalOcean Premium CPU droplets with Mellanox NICs):

| Metric | Before (iptables) | After (XDP) | |--------|--------------------|-------------| | Max DDoS absorption | ~5 Gbps | 40+ Gbps | | CPU during 10 Gbps flood | 95% | 8% | | Legitimate traffic latency during attack | 200-500ms | <5ms | | Rule update time | ~50ms (iptables flush) | <1ms (map update) | | Memory per tracked IP | ~512 bytes (conntrack) | 16 bytes (BPF map) |

The memory difference matters. During a volumetric DDoS with millions of spoofed source IPs, Linux conntrack would consume gigabytes of RAM and eventually OOM-kill processes. The BPF LRU map uses a fixed 16 MB regardless of attack size.

When Not to Use XDP

XDP is a surgical tool, not a general solution:

Application-layer filtering (WAF rules, HTTP inspection): Use a reverse proxy. XDP operates on raw packets before reassembly.
Stateful deep packet inspection: The BPF instruction limit and lack of dynamic memory allocation make complex stateful analysis impractical.
Simple firewalling: If iptables handles your load, use iptables. XDP adds complexity.
Non-Linux environments: XDP is Linux-only, kernel 4.8+.

XDP shines for: DDoS mitigation, load balancing, packet steering, telemetry/sampling, and anything where you need to make per-packet decisions at millions of packets per second.

The learning curve is steep — you need to understand C, the Linux networking stack, BPF verifier constraints, and NIC driver architecture. But once you've written your first XDP program that drops 40 Gbps of attack traffic using 3% of one CPU core, you'll never look at iptables the same way again.

Tips

LuchoTip: Always use __attribute__((always_inline)) for helper functions in XDP programs. The BPF verifier counts instructions after inlining, and a non-inlined function call adds overhead that can push you past the 1 million instruction limit on older kernels. On 5.15+ it's less critical but still good practice.

LuchoTip: Test your XDP programs with XDP_FLAGS_SKB_MODE first, then switch to XDP_FLAGS_DRV_MODE for production. SKB mode runs in the kernel's network stack (slower but always works), driver mode runs in the NIC driver (fast but needs driver support). If your NIC supports it, XDP_FLAGS_HW_MODE offloads to the NIC itself — zero CPU cost.

LuchoTip: Use BPF maps of type BPF_MAP_TYPE_LRU_HASH instead of BPF_MAP_TYPE_HASH for connection tracking. Regular hash maps fail silently when full. LRU maps evict the least recently used entry, which for a DDoS filter means evicting the attacker entry that's been quiet the longest — exactly what you want.

ebpfxdpnetworkingddosciliumlinux-kernelpacket-processing

Tools mentioned in this article

CloudflareTry Cloudflare

DigitalOceanTry DigitalOcean

Disclosure: Some links in this article are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you. I only recommend tools I personally use and trust.

X / Twitter LinkedIn WhatsApp

Seguime

Back to Blog