- Security
- A
Studying eBPF: Linux Kernel Programming for Improved Security, Networking, and Observability
We'd like to remind you about one of the most interesting niche books on Linux that we have published in recent years - "Learning eBPF: Linux Kernel Programming for Improving Security, Network Functions, and Observability" by Liz Rice.
Below, we offer a translation of an article by Luca Cavallin, in which he provides a detailed introduction to the features and capabilities of this "packet filter." Essentially, eBPF is the de facto standard mechanism for safely and efficiently introducing user code into the Linux kernel. The article explains how to properly work with this powerful tool and the possibilities it opens up.
If you're building systems, managing clusters, and frequently hitting the limits of agent capabilities, iptables tables, or kernel modules, take note of eBPF. It's the safer, faster, and more dynamic alternative you've been looking for. With eBPF, you can conveniently run small, verified programs inside the Linux kernel so you can observe the system during runtime and influence it. In practice, this gives a boost to the new wave of tools providing observability, security, and network interactions without the risk of damaging user kernel modules during operation.
Why eBPF and why it’s so important today
The eBPF technology originated during the evolution of the classic packet filter developed at Berkeley (Berkeley Packet Filter). It is a tiny engine, written in bytecode, located directly within the kernel, which tools like tcpdump have long used for packet filtering. Modern eBPF generalizes this idea: a small function is written in restricted C or Rust, then compiled into bytecode, and we ask the kernel to load it. Before this program can be executed, the verifier comes into play, symbolically executing it to check security properties. Therefore, your code must never dereference invalid pointers, must bind its loops, return exactly the value suitable for the chosen target platform, and always run to the completion of the program. If these conditions are met, the kernel can dynamically compile the bytecode into native instructions. This is largely why programs using eBPF run so fast.
eBPF does not replace kernel modules, but instead provides a way for you to expand kernel behavior without the need to write or send a module. You load the program at runtime, attach it to an event—such as a network receive interceptor, a function call within the kernel, or a point where a security-related decision is made. Then, after completing the task, you detach it. You can even pin programs and data structures in a dedicated virtual file system (bpffs at /sys/fs/bpf) so they exist longer than the loader process. Compared to using modules, the operation here goes much more smoothly and carries much lower operational risk. Add to this the performance gain achieved by executing code inside the kernel and working without context switching. It results in a technology that fits perfectly with the needs of modern cloud computing, telemetry for tracking large volumes of data, policy enforcement with minimal latency, and the need to quickly move forward in work without compromising security.
First eBPF Trial
The basic workflow with eBPF is simple: load, attach, observe, detach. To quickly evaluate this cycle in practice, you can attach a small demo to the code that will immediately produce a result—for example, allowing you to track the execution of a process.
Here’s a simple process bpftrace that counts execve() calls for a process with a given name:
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve { @[comm] = count(); }'
If you want to explore BCC (a set of resources for compiling BPF) in more depth, here’s a small Python code that attaches a small program running inside the kernel to a kprobe. A counter for PID identifiers is maintained in a dictionary:
from bcc import BPF
prog = r"""
BPF_HASH(exec_count, u32, u64);
int on_execve(void *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
u64 *val = exec_count.lookup(&pid);
u64 one = 1;
if (val) { (*val)++; } else { exec_count.update(&pid, &one); }
return 0;
}
"""
b = BPF(text=prog)
b.attach_kprobe(event="do_execveat_common", fn_name="on_execve")
print("Counting execve() per PID... Ctrl-C to stop.")
try:
b.trace_print()
except KeyboardInterrupt:
pass
for k, v in b.get_table("exec_count").items():
print(f"PID {k.value}: {v.value}")
In these examples, we get acquainted with BPF containers. These are key-value stores located within the kernel, which contain state shared between events. It is the containers that make eBPF practical: they store counters and caches, carry configuration and policies, and provide ring buffers through which events flow into user space.
From Source Code to Code Execution Inside the Kernel
The eBPF program is perceived as a tiny specialized function. You can choose an interception point and write a function with a signature that matches its context. For example, xdp_md for XDP, a tracepoint record, or kprobe/fentry or a socket buffer for network intercepts. It is compiled with the clang -target bpf option, resulting in an object containing instructions, container definitions, optionally—debug information, and often metadata BTF (BPF Type Format). To view such an object internally, utilities like bpftool or llvm-objdump are often used, which allow you to check what exactly you have built and roughly estimate what the verifier will judge.
Verification is triggered when the program is loaded. If the check passes, the kernel can save the original bytecode, the translated representation used by the verifier for analysis, as well as the dynamically compiled native image tailored to your CPU. Then, the program is attached to an event. Examples of events: XDP on a network interface, acting earlier in the stack; TC within the stack for classification and formatting; fentry/kprobe for tracing function entry. Events are also conveniently attached in tracepoints, which serve as stable positions for events, or intercepted at LSM to make security-related decisions. In modern practice, BPF-references are supported for the entire attachment period, allowing programs to remain attached even after the loader exits. Later, the element can be detached simply by removing the reference to the descriptor. If programs and containers need to exist longer than your process, pin them in the bpffs filesystem, placing them on predictable paths. When writing XDP code, never forget to perform crucial boundary checks. Validate headers using the xdp_md->data and xdp_md->data_end pointers before you touch anything in the packet.
Containers, event streams, and discoverability
Various data structures can be used as BPF containers: hash tables, arrays, per-CPU variants, LRU caches, prefix LPM trees, queues and stacks, as well as a special container variant for the ring buffer. Relatively older examples often rely on perf buffers using perf_event_open(), which provide streaming of events to user space. In more recent code samples, ring buffers are preferred, as coordination is simplified by using just a single file descriptor and a straightforward producer/consumer model. The bpftool utility is invaluable for listing loaded programs, checking immutable tags, dumping translated and JIT-compiled instructions, creating and verifying containers, as well as reading BTF data describing types and function prototypes. When programs and containers are pinned in the bpffs filesystem under /sys/fs/bpf, they can later be easily located from process to process and across restarts.
Program types and where they attach to operations
Every eBPF program has a type, and it is this type that determines which context you will get, which helper functions you can call, and which return codes will be valid. For tracing, you have kprobes and kretprobes, which follow kernel functions, tracepoints exposed by the kernel as stable events, as well as BTF-enabled fentry/fexit hooks, which can be attached to function entry and exit with minimal overhead. User space can be instrumented using uprobes and uretprobes, and detailed decisions can be made about exactly which points along the security path to attach BPF LSM programs to kernel security hooks. At the network level, XDP sits in the receive path provided for the driver, where headers can be parsed and decisions made about what to do with the packet — pass, drop, redirect, or forward. Meanwhile, TC is convenient for classifying and shaping traffic within the stack. With interceptions at the socket and cgroup level, policies are located closer to the process, and the packet metadata extractor (flow dissector) and other subsystems provide more specialized entry points. For networking, the main benefit of this approach is predictable latency: instead of branching iptables chains, we get compiled data paths where load balancing to services and enforcement of network policies are implemented.
CO-RE, BTF, and libbpf: the ability to port without separate builds for each host
Compiling a program on a single target host is a great option if you only plan to work on research tasks, but in production, this becomes a problem. The CO-RE principle — “Compile Once, Run Everywhere” — solves this problem at the level of BTF type information, allowing a precompiled object to adapt to different kernels at load time. In most modern distributions, the canonical BTF file is published at /sys/kernel/btf/vmlinux. Your eBPF object contains relocation information where it references kernel types or fields. Once you load these relocations, libbpf will resolve them according to the host’s BTF, so any potential differences in the host’s structural layout won’t break your program. In a typical workflow based on BTF, a vmlinux.h header is generated, which is then compiled with the -O2 -g -target bpf option. This way, the loader has all the information it needs. For example:
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
clang -O2 -g -target bpf -c prog.c -o prog.bpf.oLibbpf also generates “skeletons” — small C headers that wrap your containers, programs, and attachable elements, making the user-space loader almost declarative: open, load, attach, read. In practice, this works excellently: you can deliver a single complete artifact that provides reliable observability across a landscape of heterogeneous kernels while taking up minimal space during operation.
About the verifier in plain language
The verifier can be thought of as a security control mechanism that adapts your own code, running inside the kernel, for production use. It symbolically executes your program, traces the origin of pointers, and proves that all memory accesses stay within bounds. It ensures that you check pointers before dereferencing, that your loops are bounded or unrolled, that the arguments you pass to helper functions are correctly typed, and that you return a value appropriate for the exact hook point you attached to — XDP_PASS, XDP_DROP, XDP_REDIRECT, or XDP_ABORTED in the case of XDP. It also checks the license string in your object, since some helper functions are reserved only for programs licensed as GPL. If verification fails, request a detailed log; it reads like a conversation with a meticulous reviewer and can quickly teach you the idioms accepted by the kernel.
Real benefits of observability, security, and debugging network interactions
Regarding observability, eBPF can track system calls, file openings, credential changes, and kernel function calls; enrich events with process lineage, cgroup and container identification, as well as namespaces. All this data is streamed to user space almost in real time. You get detailed insights without patching applications or changing their configuration. For security purposes, BPF LSM embeds allow/deny decisions directly into designated kernel hook points, while having information about container and workload IDs, not just uid/gid. Tools like Tetragon combine deep tracing with policy enforcement, enabling you to detect and suspend suspicious behavior in advance. Regarding the network, XDP handles early decisions taken in the fast path — dropping packets, redirecting them, while TC applies classification and shapes the stack. Together they support eBPF-based CNIs that implement service-level load balancing, network policies, and even coordinate inter-node encryption with lower and more predictable latencies than long iptables chains.
When Not to Use eBPF
eBPF is a powerful tool but is not suitable for every problem. If your need can be met via user-space interception or a library call, don’t overcomplicate. If the event rate is low and latency is not a concern, a kernel-space program may be an unnecessary investment. If you need to perform long-running or blocking tasks that don’t fit within eBPF constraints (programs not belonging to specific types that can sleep), this logic is shifted to user space. If you are working with legacy kernels that lack headers or BTF, you will need to update, install the BTF package, or provide minimal BTF data before the CO-RE principle starts benefiting you.
Pitfalls and Debugging Complex Cases
Most problems fall into a few categories. If the verifier rejects something, it means that you should check boundaries more explicitly, allow fewer ambiguous control flow paths, limit the number of loops, or make them smaller. Avoid pointer arithmetic over untrusted data without checks. If your system lacks BTF, install the BTF package for the kernel or send a minimal BTF to the kernel and regenerate vmlinux.h. If you run into the limit of available resources when creating containers, it may point to RLIMIT_MEMLOCK or that the containers are too large. Choose appropriate sizes for them, and consider using LRU for caches. Regarding the lifetime of attached elements, BPF references are preferred so that programs remain attached even after the loader exits. And if performance seems suspiciously low, make sure dynamic compilation is enabled using bpftool feature and turn it on before benchmarking.
bpf() system call and its tooling
Although details are hidden in libraries, everything goes through the bpf() system call. It is usually used in libbpf or another wrapper, and it creates containers, loads programs, and attaches applications. In relatively old tracing code, you may also encounter perf_event_open(), which connects performance events with other parts of your program code. In modern programs, the lifetime of attached elements is managed using BPF references, and streaming data is organized through ring buffers. Reading results is simple: first, consume events from the buffer, then search for the required information in the container. Since programs and containers can be attached to the bpffs filesystem at /sys/fs/bpf, they are easily discoverable and reusable from process to process and restart to restart. As your systems grow, BPF-to-BPF system calls help refactor logic in functions, and global values in .rodata or .bss assist in correcting behavior during loading without the need for recompilation.
Summary
eBPF is a secure and high-performance mechanism for executing user-defined logic within the Linux kernel. You compile a small program; the verifier ensures it is safe; the kernel dynamically compiles it, and you attach it to events to modify system behavior in real-time. In containers, shared state is stored, and in ring buffers, there are streaming events. In turn, BTF and CO-RE ensure portability, while libbpf and skeletons help keep loaders small and reliable. By setting up intercepts at tracing stages, network interactions, and security enforcement — XDP, TC, kprobes/fentry, and LSM — you can instrument applications without affecting their program or configuration code and ensure precise policy compliance in a rich context. eBPF is applied from Kubernetes data planes to tracking system calls, proactive security, and has already evolved from a niche focus into a mainstream platform. Next, we study eBPF and create a small demo, in which you can load, attach, observe, detach – until you can use these patterns with your eyes closed.
Write comment