eBPF limitations in production
It is a Tuesday at 11:14 IST and Aditi, a platform engineer at a fintech in Bengaluru, is on a call with a vendor. The vendor's eBPF-based runtime-security agent has been working flawlessly on staging for six weeks. This morning she rolled it to the first production node group — sixteen Bottlerocket nodes carrying a slice of the UPI ledger service — and three of the sixteen nodes are now running with the agent in degraded mode. The agent's logs say verifier rejected program: instruction count exceeds 1000000 on one node, kernel version 5.15.140 lacks BPF_FUNC_for_each_map_elem on another, and failed to attach kprobe: vfs_read symbol not exported in this kernel build on the third. The vendor's response is a polite "we'll look into it; what kernel version are your staging nodes running?" Aditi already knows the answer: staging is on kernel 6.1; production is on 5.15. The same agent. The same Helm chart. Different kernels. Different outcomes.
eBPF in production is not the eBPF in the demo. The verifier rejects programs that worked yesterday, kernel-version skew across your fleet means the same program loads on some nodes and not others, hooks you used last year disappear when the kernel maintainers refactor a function, and the operational load of running eBPF tooling at fleet scale rivals the load of running the kernel itself. None of these break the technology; they just mean eBPF is a kernel feature, and treating it like a userspace library is the most common way teams burn six months learning the lesson.
The verifier is a strict editor, and your demo passed only because it was small
The eBPF verifier is a static-analysis pass that runs every time you load a program. It walks every possible execution path, tracks the type of every register at every instruction, refuses any unbounded loop, refuses any pointer arithmetic it cannot prove safe, and refuses programs that exceed roughly one million verified instructions or 8,192 instructions of source. The point is to guarantee that your program cannot crash the kernel — and the cost is that the verifier rejects a large class of programs that humans would call "obviously correct".
The simplest example: a for loop that walks a fixed-size array. In a userspace C program this is trivial. In an eBPF program it depends on whether the loop bound is a compile-time constant, whether the verifier can fold the constant, whether the loop body modifies a pointer, and whether the kernel version is recent enough to have bpf_loop() (introduced in 5.17) — without bpf_loop(), the verifier unrolls every iteration and the instruction count explodes. A 64-iteration loop with 30 instructions per body becomes 1,920 verified instructions; a 1,024-iteration loop becomes 30,720. Programs that worked at 64 iterations do not load at 1,024.
/sys/kernel/debug/tracing.Why the verifier rejects more than you expect: it tracks register types and value ranges per branch, and a program with three if statements has up to eight branches the verifier walks independently. A pointer that is bounds-checked on one branch but not the other will be rejected even though the unchecked branch is "obviously" unreachable — the verifier does not run your program, it runs every possible program your bytecode can express. The fix is to add explicit if (ptr + 14 > skb_end) return TC_ACT_OK; bounds-checks on every branch, even the ones a human would prove dead. Your code looks paranoid; the verifier is satisfied; the program loads.
The practical consequence: every eBPF program is a coupling between your code, the clang version that compiled it, the LLVM BPF backend's optimisation passes, and the verifier in the kernel that will receive it. A program that loaded fine when compiled with clang-16 may fail to load when compiled with clang-17 because clang-17 emits slightly different bytecode that the verifier's range tracker cannot follow. This has happened in real upstream projects — Cilium has shipped clang-version-specific workarounds since 2022 — and the diagnostic, when it bites your team, is "the agent worked on the dev machine; it does not work in CI; both are running the same Helm chart". The fix is to pin the clang version used to build the BPF object, and to test the resulting object against every kernel version your fleet runs.
Kernel-version skew across the fleet — the same code, different verdicts
A typical Indian fintech in 2026 runs Kubernetes on a mix of node images: Bottlerocket on EKS for stateless workloads (kernel 5.15.x or 6.1.x depending on AMI age), Amazon Linux 2023 for workloads that need a slightly older toolchain (kernel 6.1.x), and a few self-managed RHEL 8 nodes with custom kernels (4.18.x with backported BPF features). All three are "supported" by the cloud provider. None of them run the same kernel.
For an eBPF tool, this matters in three concrete ways.
Helper availability. Every BPF helper function (bpf_get_current_pid_tgid, bpf_perf_event_output, bpf_for_each_map_elem, bpf_ringbuf_output, etc.) was added to the kernel in a specific version. bpf_ringbuf_output arrived in 5.8; bpf_for_each_map_elem in 5.13; bpf_loop in 5.17. A program that calls bpf_loop will fail to load on a 5.15 kernel with the error unknown func bpf_loop#181. The CO-RE (Compile Once, Run Everywhere) pattern handles type relocations across kernel versions but does not invent helpers that the running kernel does not have — you have to either gate the call behind a feature probe or restrict the program to kernels new enough.
Hook stability. A kprobe attaches to a kernel function by name (e.g. do_sys_openat2). Kernel maintainers refactor functions, rename them, mark them static, or fold them into other functions. Between 5.10 and 6.1, roughly 4% of the 50,000 exported kernel symbols changed in a way that breaks naive kprobe attachments. Tracepoints are more stable (they are part of the kernel ABI promise) but cover fewer events. fentry/fexit attachments use BTF (BPF Type Format) and survive most refactors but only work on 5.5+. A production eBPF agent needs a fallback ladder: try fentry first, fall back to kprobe, fall back to a tracepoint, and on every kernel where none of those work, log a warning and run in degraded mode.
LSM and security-module gating. On RHEL/Rocky kernels, the bpf_lsm security hooks are sometimes compiled out, so BPF_LSM_MAC programs fail to load with program of this type cannot be loaded into the kernel. On hardened distributions (Bottlerocket with FIPS mode, some banking-grade Linux variants), the entire bpf() syscall can be restricted to capability-bearing processes, and your DaemonSet needs CAP_BPF and CAP_PERFMON (5.8+) or CAP_SYS_ADMIN (older). A misconfigured PodSecurityPolicy or PodSecurityStandard restricted profile will reject your DaemonSet at admission time.
bpf_loop and bpf_kfunc will not load on the RHEL 8 nor the Bottlerocket-LTS rows; only the AL2023 row supports it. The fix is either feature-probe-and-fall-back at runtime, or fleet-image consolidation.# ebpf_fleet_capability_audit.py — survey a Kubernetes fleet and report
# which kernel features each node supports. Required to know which eBPF
# programs can be deployed where.
# pip install kubernetes pandas
import json, subprocess
from collections import defaultdict
from kubernetes import client, config
import pandas as pd
# Match against the helpers/features your eBPF agent uses
REQUIRED_FEATURES = {
"bpf_ringbuf_output": (5, 8),
"bpf_for_each_map_elem": (5, 13),
"bpf_loop": (5, 17),
"fentry/fexit": (5, 5), # needs BTF in vmlinux
"BPF_LSM_MAC": (5, 7), # also needs CONFIG_BPF_LSM=y
"tcp_ca (cong-control)": (5, 6),
"bpf_kfunc": (5, 18),
}
def kernel_to_tuple(s: str) -> tuple:
parts = s.split(".")[:2]
return (int(parts[0]), int(parts[1]))
def has_btf(node_name: str) -> bool:
# rough probe: is /sys/kernel/btf/vmlinux exposed inside the kubelet's view?
# in practice the agent DaemonSet would test from inside a privileged pod;
# here we use a hostPath debugger sidecar pattern via kubectl debug.
cmd = ["kubectl", "debug", f"node/{node_name}", "--image=busybox", "--",
"test", "-f", "/host/sys/kernel/btf/vmlinux"]
return subprocess.run(cmd, capture_output=True).returncode == 0
config.load_kube_config()
v1 = client.CoreV1Api()
nodes = v1.list_node().items
rows = []
for n in nodes:
name = n.metadata.name
kernel = n.status.node_info.kernel_version # e.g. "5.15.0-1075-aws"
os_image = n.status.node_info.os_image
kver = kernel_to_tuple(kernel)
feats = {feat: kver >= req for feat, req in REQUIRED_FEATURES.items()}
feats["btf"] = has_btf(name) # requires kubectl debug capability
rows.append({"node": name, "kernel": kernel, "os": os_image, **feats})
df = pd.DataFrame(rows)
print(df.to_string(index=False))
# Summarise: which feature is the bottleneck?
print("\nfeature support across fleet:")
for feat in REQUIRED_FEATURES:
supported = df[feat].sum()
print(f" {feat}: {supported}/{len(df)} nodes")
unsupported = df[df[list(REQUIRED_FEATURES.keys())].sum(axis=1) < len(REQUIRED_FEATURES)]
print(f"\nnodes that cannot run the full agent: {len(unsupported)}")
print(unsupported[["node", "kernel", "os"]].to_string(index=False))
Sample run on a 24-node fintech cluster (mix of Bottlerocket and AL2023):
node kernel os bpf_ringbuf_output bpf_for_each_map_elem bpf_loop fentry/fexit BPF_LSM_MAC tcp_ca (cong-control) bpf_kfunc btf
ip-10-42-1-12.ap-south-1.compute 5.15.0-1075-aws Bottlerocket OS 1.13.4 (aws-k8s-1.27) True True False True True True False True
ip-10-42-1-43.ap-south-1.compute 5.15.0-1075-aws Bottlerocket OS 1.13.4 (aws-k8s-1.27) True True False True True True False True
ip-10-42-2-77.ap-south-1.compute 6.1.66-91 Amazon Linux 2023.3.20240131 True True True True True True True True
ip-10-42-2-91.ap-south-1.compute 6.1.66-91 Amazon Linux 2023.3.20240131 True True True True True True True True
ip-10-42-3-15.ap-south-1.compute 4.18.0-553.el8 Red Hat Enterprise Linux 8.10 True False False False False False False False
... (19 more rows)
feature support across fleet:
bpf_ringbuf_output: 24/24 nodes
bpf_for_each_map_elem: 22/24 nodes
bpf_loop: 8/24 nodes
fentry/fexit: 22/24 nodes
BPF_LSM_MAC: 22/24 nodes
tcp_ca (cong-control): 22/24 nodes
bpf_kfunc: 8/24 nodes
btf: 22/24 nodes
nodes that cannot run the full agent: 16
node kernel os
ip-10-42-1-12.ap-south-1.compute 5.15.0-1075-aws Bottlerocket OS 1.13.4 (aws-k8s-1.27)
ip-10-42-1-43.ap-south-1.compute 5.15.0-1075-aws Bottlerocket OS 1.13.4 (aws-k8s-1.27)
ip-10-42-3-15.ap-south-1.compute 4.18.0-553.el8 Red Hat Enterprise Linux 8.10
... (13 more rows)
Walk through. The bpf_loop row is the headline: only 8 of 24 nodes have a kernel new enough (5.17+) to support it, so any feature in the agent that uses bpf_loop works on a third of the fleet. The two RHEL 8 nodes lack BTF entirely, which cascades — without BTF, fentry/fexit cannot attach, CO-RE relocations have nothing to anchor against, and bpf_lsm is unavailable. The bpf_kfunc column is nearly empty because that capability requires 5.18+ and most clusters live on 5.15 LTS. The script's last line is the operationally important one: "16 of 24 nodes cannot run the full agent" — meaning the platform team needs a feature-flag table per node and a deployment that gates programs on capability probes, not a single eBPF object that "just works" everywhere.
Why feature probing has to happen at runtime, not build time: the same Helm chart is deployed across staging (kernel 6.1), production-aws (kernel 5.15), and production-onprem (kernel 4.18). Build-time selection assumes you know the target kernel; runtime selection assumes you do not. A robust agent ships every program variant it might need, probes the kernel on startup with bpftool feature probe, and loads the variant the kernel can run. The cost is roughly 2x the binary size and a slightly slower startup; the benefit is that "deploys to a new node image" is no longer a code change.
When eBPF crashes the system — degraded modes that still hurt
eBPF cannot crash the kernel — that is the verifier's primary contract. But it can degrade the system in ways that look like crashes from the user's perspective.
Tracepoint backpressure. A tracepoint:syscalls:sys_enter_read program runs on every read() syscall. On a node serving 200,000 syscalls/second, the program is invoked 200,000 times/second. If each invocation takes 200ns of CPU, the program is consuming 4% of one core per syscall hook — for a 16-core node, that is 0.6% of fleet CPU dedicated to running the eBPF program. Multiply across hooks (sys_enter_read, sys_enter_write, sys_enter_openat, etc.) and a poorly-budgeted observability agent can consume 5–15% of node CPU. The application slows down; the dashboard you are observing it through shows higher latency; the conclusion you reach is "the application got slower this week" when the actual cause is "we deployed an eBPF agent that taxes every syscall".
Perf buffer / ringbuf drops. When the eBPF program emits more events than userspace can consume, the perf buffer or ringbuf fills, and subsequent emits drop with -E2BIG or -EAGAIN. The kernel does not block; it drops, and the agent's metrics increment a events_dropped_total counter. Your dashboard now has a gap, and the gap looks like "no traffic" rather than "observation lost". On a burst — say a fork()-bomb during a misbehaving CI job — the drop rate can hit 40-60% of events for several seconds, and any metric that depends on counting (process counts, exec counts, file-open counts) reads as low.
Map churn and memory pressure. BPF maps live in kernel memory and consume both kmalloc-ed slab and per-CPU pages. A BPF_MAP_TYPE_LRU_HASH with 1M entries and 64-byte values is 64MB of kernel slab — multiplied by the per-CPU allocation strategy, the actual footprint can be 2-4× higher. On a node that runs Cilium (~250MB of BPF maps), Pixie (~400MB), and a runtime-security agent (~150MB), 800MB of node RAM is permanent kernel BPF state. On a 4GB worker node, that is 20% of RAM gone before any pod schedules. The OOM killer never fires (this is kernel, not user, memory) but pods get evicted because the kubelet's scheduler does not see the BPF maps as "used" but the available pool is smaller than it computes.
eBPF program memory leaks. A program that allocates per-CPU array entries but never deletes them (because the eviction logic is bugged, or the LRU was set too high) will grow without bound. The map type does not protect you — a fixed-size hash map silently overwrites entries, but a chain of programs writing to a per-event ring of maps can leak. Diagnosing this means bpftool map show and reading per-map memory consumption, which is not in any standard metric.
# ebpf_resource_audit.py — read kernel BPF state for every loaded
# program and map, identify the heaviest consumers of node resources.
# Run as DaemonSet with hostPID and CAP_BPF.
# pip install pandas
import subprocess, json
import pandas as pd
def bpftool_json(args: list) -> list:
r = subprocess.run(["bpftool"] + args + ["-j"], capture_output=True, text=True)
return json.loads(r.stdout) if r.returncode == 0 else []
# All loaded programs
progs = bpftool_json(["prog", "show"])
prog_rows = []
for p in progs:
prog_rows.append({
"id": p["id"],
"name": p.get("name", ""),
"type": p.get("type", ""),
"tag": p.get("tag", ""),
"loaded_at": p.get("loaded_at", 0),
"uid": p.get("uid", 0),
"instructions": p.get("xlated_jited", {}).get("xlated", 0),
"memlock_kb": p.get("bytes_memlock", 0) // 1024,
})
prog_df = pd.DataFrame(prog_rows).sort_values("memlock_kb", ascending=False)
# All loaded maps
maps = bpftool_json(["map", "show"])
map_rows = []
for m in maps:
bytes_per_entry = m.get("bytes_value", 0) + m.get("bytes_key", 0)
max_entries = m.get("max_entries", 0)
map_rows.append({
"id": m["id"],
"name": m.get("name", ""),
"type": m.get("type", ""),
"max_entries": max_entries,
"key_b": m.get("bytes_key", 0),
"value_b": m.get("bytes_value", 0),
"memlock_kb": m.get("bytes_memlock", 0) // 1024,
"worst_case_mb": (bytes_per_entry * max_entries) // (1024 * 1024),
})
map_df = pd.DataFrame(map_rows).sort_values("memlock_kb", ascending=False)
print("== top 10 eBPF programs by locked memory ==")
print(prog_df.head(10).to_string(index=False))
print(f"\ntotal program memlock: {prog_df['memlock_kb'].sum() // 1024} MB")
print("\n== top 10 eBPF maps by locked memory ==")
print(map_df.head(10).to_string(index=False))
print(f"\ntotal map memlock: {map_df['memlock_kb'].sum() // 1024} MB")
print(f"worst-case if all maps fill: {map_df['worst_case_mb'].sum()} MB")
Sample run on a 16-core Bottlerocket node running Cilium, Pixie, and a Falco-replacement runtime-security agent:
== top 10 eBPF programs by locked memory ==
id name type tag instructions memlock_kb
1842 cil_to_container sched_cls ... 14821 132
1843 cil_from_container sched_cls ... 12480 128
1844 cil_lxc_policy_in sched_cls ... 9842 100
2104 pixie_proc_exec_kp kprobe ... 4218 80
2107 pixie_tcp_data_uprobe uprobe ... 7820 80
2210 falcoreplace_execve_kp kprobe ... 5431 76
...
total program memlock: 14 MB
== top 10 eBPF maps by locked memory ==
id name type max_entries key_b value_b memlock_kb worst_case_mb
443 cilium_ct_global LRU_HASH 524288 16 112 131072 64
211 pixie_socket_data PERCPU_ARRAY 4096 4 4096 262144 512
214 pixie_proc_events RINGBUF 0 0 0 16384 16
445 cilium_lb_services HASH 65536 16 96 14336 7
447 cilium_policy HASH 16384 12 24 2304 0
...
total map memlock: 482 MB
worst-case if all maps fill: 612 MB
Walk through. The pixie_socket_data map is the heaviest single consumer at 256MB of locked kernel memory — a per-CPU array of 4KB buffers, 4,096 entries per CPU, on a 16-CPU node that is 16 × 4,096 × 4KB = 256MB. That memory is not visible to the kubelet's eviction logic; it is "kernel slab", not "pod working set". The total map memlock of 482MB is permanent overhead per node — every node carries it whether or not pods are running. The worst-case if all maps fill at 612MB is the answer to "if Cilium's connection tracker hits saturation simultaneously with Pixie under heavy load, how much node RAM does eBPF take?". For a 4GB node, that is 15% of RAM you cannot reclaim. The capacity-planning consequence: when sizing nodes for an eBPF-heavy fleet, subtract 600-800MB from the schedulable allocation before computing pod density.
Why per-CPU maps surprise capacity planning: a BPF_MAP_TYPE_PERCPU_HASH with max_entries=4096 and value_size=4096 does not reserve 16MB — it reserves 16MB times the number of CPUs. On a 16-vCPU node that is 256MB; on a 96-vCPU node it is 1.5GB. The semantics are correct (per-CPU isolation eliminates cache-line bouncing on hot writes) but the resource cost scales with vCPU count in a way that vendor sizing guides almost never document. The fix when you discover this in production is either to switch to a non-per-CPU map (slower writes, much smaller footprint) or to right-size max_entries based on actual hot-set rather than worst-case capacity. Both require knowing your workload; neither is the default.
Vendor product reality — what "agentless" really means
The eBPF wave brought a generation of "agentless" or "no-instrumentation" products: runtime-security platforms (Falco, Tetragon, Tracee), continuous-profiling SaaS (Pixie, Parca cloud), L7 observability layers (Hubble, Pixie, Coroot). The marketing pitch is "deploy a DaemonSet, get observability with zero application changes". The pitch is partly true and partly a category error — see the previous chapter, /wiki/agentless-observability-claims, for the long version. This chapter cares about the operational tax these tools introduce.
Each tool is a privileged DaemonSet. Running an eBPF agent requires hostPID, hostNetwork (in some cases), CAP_BPF and CAP_PERFMON (or worse, CAP_SYS_ADMIN), and access to /sys/kernel/debug, /sys/fs/bpf, and often /proc. From a security review perspective, an eBPF DaemonSet is functionally equivalent to giving the vendor root on every node. This is the right trade-off for many organisations, but it deserves an explicit decision, not a Helm-install-and-forget. Banking compliance teams in 2026 increasingly require eBPF agents to be reviewed at the same rigour as kernel modules — because functionally, that is what they are.
Vendor support cycles do not match kernel support cycles. A vendor's eBPF agent is built against, say, kernel 6.1 with BTF support. When AWS rolls a new Bottlerocket image with kernel 6.6, the vendor needs to test, recompile if needed, and ship a new agent version. The lag between kernel release and vendor support is typically 4-12 weeks. Production fleets that pin agent versions to avoid surprises end up either pinning the kernel too (slowing security patches) or running on a kernel newer than the agent supports (with degraded features). Cilium handles this well because the project is the dominant consumer; smaller vendors handle it less well.
Conflicts between agents. Two eBPF agents that both attach kprobe:vfs_write cannot share state — they each load their own program, the kernel runs both serially, and the per-syscall overhead doubles. Agents that attach to the same tracepoint typically coexist (the kernel's perf-event multiplexing handles this) but agents that attach to the same bpf_lsm hook conflict at the LSM layer. Production fleets running three eBPF observability tools usually discover a 2-3% syscall-rate degradation that none of the individual vendor benchmarks predicted, because vendor benchmarks measure the agent in isolation.
Common confusions
- "eBPF is safe because the verifier won't let bad programs run." True for kernel safety; misleading for system safety. The verifier prevents memory corruption and infinite loops in your eBPF program. It does not prevent your program from consuming 8% of node CPU on a hot syscall, leaking entries into a 1M-entry hash map, or interfering with another eBPF program loaded at the same hook. System-level effects are entirely your problem.
- "CO-RE means write once, run on any kernel." Partly. CO-RE handles type-relocation (the offset of a field inside a kernel struct that may differ between kernel versions). It does not invent helpers your kernel does not have, attach to symbols your kernel does not export, or work around tracepoints your kernel does not define. CO-RE makes the type-shape problem disappear; the capability problem remains.
- "eBPF replaces kernel modules." For observability and tracing, often yes. For drivers, networking control planes that need full hardware offload, or anything requiring blocking syscalls, no. The two technologies serve overlapping but non-identical purposes; "eBPF will eat the kernel-module ecosystem" is a 2019 slogan that 2026 reality has nuanced.
- "Bytecode loading is the same as program loading." It is not. The bytecode load (
bpf(BPF_PROG_LOAD)) is the verifier pass plus JIT. Attachment (bpf(BPF_PROG_ATTACH), orperf_event_open, or kprobe-attach via the tracefs interface) is a separate step that can fail for entirely different reasons — symbol not exported, hook already attached and not shareable, security-module rejection. Programs that pass the verifier can still fail to attach. - "eBPF is portable across architectures." The bytecode is architecture-neutral; the hooks are not. A
kprobe:do_sys_openat2works on x86_64 and arm64 with the same source. A program that readspt_regs->dito get a syscall argument does not —diis the x86_64 first-argument register; on arm64 the equivalent isregs[0]. CO-RE includes architecture-aware macros (PT_REGS_PARM1_CORE) for exactly this reason; ad-hoc programs that hard-code register names are x86-only. - "eBPF replaces sysdig / strace / ltrace entirely." For ad-hoc debugging on a single host, eBPF (
bpftrace,bcc) is strictly better thanstrace— lower overhead, more flexible filtering, no PTRACE coupling. For a sealed binary distributed to customers,stracestill works without root or kernel headers; eBPF requires both. For container-internal tracing where the kernel is shared, eBPF is the only option that does not require modifying the container. The right answer is "eBPF for the new generation of work; strace for the cases where you cannot install a kernel module".
Going deeper
What the verifier log looks like, and how to read it
When bpf(BPF_PROG_LOAD) fails with -EINVAL, the kernel writes a verifier log into the userspace buffer your loader provided. The log is a textual trace of every instruction the verifier walked, every register state at every point, and the specific reason for rejection. A typical log starts with hundreds of lines like 0: (b7) r2 = 0 and 1: (61) r1 = *(u32 *)(r6 +0), followed by the failing instruction and a message like R1 type=inv expected=fp or math between map_value pointer and register with unbounded min value is not allowed.
The skill is reading the last 50 lines of the log: that is where the failure is. The first ten thousand lines are the verifier's accepted-path trace; you only care about the rejected path. bpftool prog load and the libbpf bpf_object__load API both surface the log as a string; pipe it to a file and tail -100. After three or four failures, you start recognising the patterns — R1 unbounded memory access means a missing bounds check; R0 invalid mem access 'inv' means dereferencing a pointer that was not first checked for null; back-edge from insn N to M means an unbounded loop the verifier could not unroll.
A useful tool: cilium-cli bpf trace (and bpftool prog tracelog) lets you dump the verifier log of an already-loaded program, useful for understanding what the verifier accepted and to compare against a failing variant.
The cost of BPF_F_TEST_RUN and why CI should run it
BPF_PROG_TEST_RUN is a bpf() syscall that runs your program against synthetic input data and returns the output. CI pipelines for serious eBPF projects (Cilium, Tetragon, libbpf-tools) use it for two reasons: (1) it verifies the program loads and runs without needing a real kernel hook (you can test a packet-filter without sending packets), and (2) it tests the program against multiple kernel versions in vmtest or qemu setups.
The rule for a production eBPF project: every program you ship has a BPF_PROG_TEST_RUN test that exercises at least the happy path on every kernel version you support, run in CI on PR. This catches verifier-rejection regressions before they hit production. The Cilium project runs this matrix across kernels 5.10, 5.15, 6.1, 6.6, and the latest LTS; smaller projects often skip it and pay for it later.
Where eBPF cannot reach: virtual-machine boundaries and userspace runtimes
eBPF runs in the kernel; it sees what the kernel sees. Workloads inside hardware-virtualised guests (VMware, KVM, Firecracker microVMs) are opaque to the host kernel's eBPF — the host sees disk I/O and network packets but not syscalls inside the guest. To get syscall observability inside a microVM you need an eBPF agent inside the microVM, which is often impossible (microVMs are typically minimal images with no DaemonSet infrastructure). This is why AWS Fargate, Google Cloud Run, and similar serverless-container platforms expose almost no eBPF surface to customers — the kernel that runs your code is not yours to instrument.
Userspace runtimes that bypass syscalls (DPDK, io_uring with a userspace poller, RDMA) are also opaque. eBPF kprobe/tracepoint hooks live on the syscall path; if the application has DMA-mapped a buffer and is reading it directly without syscalls, no eBPF hook fires. A Razorpay UPI payment service running on a standard kernel with read()/write() is fully observable; the same service moved to io_uring with a busy-poll loop becomes an opaque box from eBPF's perspective. This is a real trade-off: high-performance I/O sacrifices observability surface, and "we'll add eBPF later" is harder than it sounds.
Operational maturity: what running eBPF in production looks like in year three
In year one, a team adopts eBPF for one purpose (Cilium for networking, or Pixie for L7 visibility). It works. Demos are convincing.
In year two, the team adds two more eBPF agents (a runtime-security tool, a continuous profiler). Performance starts to feel slightly off. Capacity planning gets harder. Vendor-version pinning becomes a recurring discussion.
In year three, the team has 4-6 eBPF agents on every node, a dedicated platform engineer who owns "eBPF capacity", a quarterly audit of program memory and CPU consumption, a process for testing each new kernel image against every loaded eBPF agent, and an incident playbook for "agent X regressed on kernel image Y". The total operational cost is roughly one full-time engineer per 1,000 nodes — comparable to the cost of running the kernel itself. eBPF stops being free.
This is not unique to eBPF; every kernel-adjacent technology hits the same maturity curve. The thing the marketing does not tell you is that the curve exists at all, and that "deploy a DaemonSet" is the easy 5% of the path.
Reproduce this on your laptop
# Audit BPF state on your laptop (or a privileged container)
sudo apt install bpftool # Debian/Ubuntu
sudo dnf install bpftool # Fedora/RHEL
sudo bpftool feature probe # what your kernel supports
sudo bpftool prog show # currently loaded programs
sudo bpftool map show # currently loaded maps
# Run the resource-audit script (requires Python 3.11+ and root)
python3 -m venv .venv && source .venv/bin/activate
pip install pandas
sudo .venv/bin/python3 ebpf_resource_audit.py
# Try loading a deliberately-bad program to see the verifier log
cat > bad.c <<'EOF'
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("kprobe/do_sys_openat2")
int bad(void *ctx) {
int i = 0;
while (i < 100000) i++; /* unbounded loop */
return 0;
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -target bpf -c bad.c -o bad.o
sudo bpftool prog load bad.o /sys/fs/bpf/bad type kprobe
# Watch it fail with verifier log; read it with:
sudo cat /sys/kernel/debug/tracing/printk_formats || dmesg | tail -200
Where this leads next
The next chapter — /wiki/comparing-ebpf-with-traditional-tools-tcpdump-strace — places eBPF in honest comparison with the older Linux observability tools. Once you accept that eBPF is bounded (verifier, kernel skew, operational cost), the question becomes "what does it actually do better than tcpdump, strace, ss, and bpftrace-as-a-front-end?". The answer, it turns out, is "fleet-scale, low-overhead, programmable observation", and the older tools win on "deep dive on a single host with no privileges to deploy an agent". They are complementary, not competitive.
After that, /wiki/the-future-of-ebpf-and-observability closes Part 8 by looking at where the technology is heading: schedulable BPF (the sched_ext framework), userspace eBPF (uBPF, rbpf), Windows eBPF, the kfunc transition that replaces the helper ABI. The eBPF that exists in 2026 is not the eBPF that will exist in 2030, and a curriculum that did not say so would be lying.
For the broader arc, this chapter is the skeptical pivot. Part 8's first chapters established what eBPF makes possible; this one bounds the claims so the reader's mental model stays calibrated when a vendor pitches "agentless observability for free".
References
- The Linux kernel BPF documentation, specifically
Documentation/bpf/verifier.rst— the canonical description of every check the verifier performs. - Andrii Nakryiko, "BPF CO-RE reference guide" (
nakryiko.com/posts/bpf-core-reference-guide/) — the practitioner's guide to writing programs that survive kernel changes. - Brendan Gregg, "BPF Performance Tools" (Addison-Wesley, 2019), Appendix A (BCC tool dependencies and kernel requirements) — the most comprehensive table of helper-by-kernel-version dependencies in print.
- Alexei Starovoitov, "BPF and kfuncs: extending the kernel through stable interfaces" (LPC 2022) — the design discussion behind the kfunc transition replacing the helper ABI.
- KubeCon EU 2024, "Five Years of Cilium in Production: What We Learned" — the case study that informs the year-three operational-cost framing.
- Lin Sun et al., "Tetragon: Efficient Runtime Security and Visibility through eBPF" (Cloud Native Rejekts 2023) — a concrete example of how a production runtime-security agent handles the kernel-skew and capability-probe problems.
/wiki/agentless-observability-claims— the marketing-honest framing for what eBPF observability gives you and does not./wiki/ebpf-for-network-observability-cilium-hubble— the previous chapter; the working demonstration that this chapter qualifies.