Heads up, this is a fairly detailed deep dive into some of what libbpf does and knowledge on BPF is required. Luckily I have you covered with this talk I gave on BPF.
libbpf is a C library that gives you a nice API for opening and loading BPF bytecode contained in Executable and Linkable Format (ELF) files. You declaratively lay out your components in a single compilation unit: BPF maps as global variables, programs as functions. This gets compiled into your ELF, and the loader does everything else for you, even attach programs to certain hook points.This is a huge improvement over BCC where you’d write a C based BPF program in a python string literal, as well as use python as a preprocessor.
We’re going to learn about one of the operations libbpf automates when loading your bytecode. BPF maps are used as general-purpose storage between userspace and executions of BPF programs. When creating a map we receive a file descriptor and that needs to be stored in the binary blob containing a program before that it’s loaded into the kernel. Luckily the information on how to do this is stored in the ELF file created from compiling your BPF code and libbpf knows how to parse this file so that it can orchestrate the above. Understanding these mechanics will enable developers to write their own loaders and improve their mental model of the BPF subsystem.
Furthermore, all code in this article will be written in Zig, a simple language with top notch error handling, compile-time capabilities, and a powerful build system. At the end, if you want to take a closer look at the code, you can find it in this repo. Don’t worry if you haven’t heard of Zig before, because you already know how to read it.
Challenges Learning BPF
One of the challenges of learning BPF, and libbpf is no exception to this, is that large swaths of it are undocumented (thought that is starting to improve). When I first started learning BPF this library constantly posed an opaque barrier to understanding. My goal was to learn how userspace should correctly instrument specific syscalls, so I dove into the source to find out. While this was going on I was also learning Zig, and discovered that it was able to produce decent bytecode out-of-the-box™ (that is, no hacks required).
There are a large number of other operations that libbpf does, for example, somehow loading the read-only section of the binary separate from everything else. It also expects programs to have their own section using a naming scheme that declares how/where the program needs to be loaded. Here you can find a massive table laying out this information I have yet to decipher, but I digress, let’s get to our BPF program.
Baby’s first Zig BPF Program
So this is a pretty innocuous example, all we’re doing is getting time through a BPF Helper function, and then writing it to a Perf Event Buffer, but the important part is that the program is referencing a BPF map. Building this is extremely simple, Zig build scripts are written in Zig (a nice resource on that), and all we have to do is target a freestanding BPF architecture, and I’ve set the endianness to match whatever we’re targeting for the main userspace program.
You’ll also notice that I’m outputting the compiled object file to the `src` directory. That’s because Zig has the `@embedFile()` builtin which will let me embed a file as an array of bytes within the main program, this way I have a single binary at the end of the build process.
We’ll be attaching the BPF program to a raw socket on the loopback device so that it’s run when packets are sent or received. Now let’s inspect the ELF (only showing what’s important):
Section `socket1` contains our program and can be seen as an array of instructions. `maps` similarly is an array of BPF map definitions (in our case and array of one map):
The loader can read the `maps` section directly and use `BPF_MAP_CREATE` with the `bpf()` syscall to create all of our maps. Next comes the cool part, we have that array of instructions that makes up our program:
I won’t go into detail for each instruction because the Cilium BPF Reference has a lot of great information that you can review yourself, but the one we’re interested in is #6. According to the signature of the `perf_event_output()` Helper we’re supposed to be loading a “map pointer” into register 2, and in the code we do give it the address of the map, but the instruction loads r2 with the value zero, or null. What we actually need to do for loading is replace the immediate value of zero with the file descriptor of the map — super funky. From the loader’s perspective it’s able to determine exactly which map goes where with the `.relsocket1` section which contains links between entries in the `.symtab` section (symbols) and offsets within the program code. Here is a chunk of code showing my simplified Zig runtime loader rewriting those immediate values:
Next Steps: Leveraging Zig’s comptime Features
Down the road I’m going to create a loader that takes advantage of Zig’s comptime abilities to parse the embedded object files. This would add compile time verification of the BPF environment’s contents, remove ELF parsing code from our executables and reduce the size of what’s being embedded — a full implementation of this equivalent is ways off, but I do have a proof-of-concept in our main program that asserts at compile-time that the embedded `probe.o` file contains the `socket1` section. If it didn’t contain the section we would get a compile error.
libbpf is becoming the standard in how BPF is used in production, and while it’s great to have well known, trusted tools it’s also important to understand how our tools work and where they may fall short. I’m going to take this knowledge and try to leverage Zig’s comptime abilities to improve communicating the BPF environment to userspace and others might also find novel new ways to do this as well. BPF tech is moving fast and I’m excited for the future.