Book Review: Learning eBPF
- Learning eBPF
- Liz Rice
- 234 pages
- O’Reilly (2023)
- ISBN: 978-1098135126
The extended Berkeley Packet Filter, or eBPF for short, is a plugin architecture for the Linux kernel. Using eBPF, it is possible to load (short) programs into the kernel at runtime, and have them be executed by the kernel. As the name suggests, the technology started out as a method to build custom filters for network packets, but it has since been extended and become much more general.
In “Learning eBPF” Liz Rice provides a short (200 pages) introduction to eBPF, its capabilities, and the tools and ecosystem supporting it.
The book can be purchased from any bookseller, but a PDF version can also be downloaded for free from the Isovalent company website. Seeing this made me somewhat apprehensive — might this book be little more than an extended a marketing whitepaper?
But my concerns turned out to be unfounded. This is a “real” book — more than that, it is both a valuable and pleasant read.
The content follows a sensible outline. After a brief introduction to
eBPF and its history, the book introduces the BCC Python library,
which can be used to load eBPF programs into the kernel with a minimum
of fuss. Next, we get a somewhat in-depth look at the virtual machine
that executes eBPF programs within the kernel, and dissect the
bytecode for some simple programs. That’s possibly more detail than
one needs, but it is actually quite helpful in bringing some of the
otherwise unfamiliar concepts to life. The book then moves on to the
bpf()
system call and various tools and libraries supporting eBPF.
We learn how to get data into and out of eBPF programs, and how to
attach them to various events in the kernel.
The eBPF verifier is treated in a short, separate chapter. To ensure that eBPF programs do not crash the kernel, they all must pass a “verifier” step that ensures, for example, that the program only accesses memory it has permissions to access, that loops will terminate, and that the program will return a valid return code. This is an interesting topic, because it is one of the few examples of formal code verification in “real” (that is, non-academic, non-research) code. What makes the eBPF verifier feasible is, at least in part, the fact that all eBPF programs must be relatively short and cannot include arbitrary instructions. (As a side note, I wonder whether the importance of the eBPF verifier might exceed the impact of eBPF itself: as proof that formal, automated program verification can work in practice!)
The book concludes with some application ideas (networking and security) and a general outlook.
As I said before, the book makes for a pleasant read. The pace is brisk; the content is clear, relevant, useful. The basic ideas are presented very clearly, one comes away with a very clear idea for eBPF and what it can do, and the tools and techniques to use it.
Not everything is perfect. The presentation can be uneven (on the one hand, we are given an explanation what Unix file descriptors or C header files are, but are then left with some cryptic Makefile options, or passing references to the DWARF debugging format), but overall is easy to follow. More problematic is the fact that external references are not collected anywhere, but dispersed throughout the text. Given how rapidly the field develops, much of the documentation exists only in personal blogs, kernel sources, and other individual company or project websites. The text provides many valuable pointers to them, but does not make it easy to find them all at once.
More generally, the book follows a more general trend that I have observed in recent years. As detailed reference documentation has moved from print to the Web, books have increasingly taken on the role of “warm-ups” or “motivators”. They describe what a technology can do, and how it might be used, at a rather high level, often by way of a bunch of small, tutorial case studies. The presentation usually does not to go into too much depth: after all, if you need more, there is the reference documentation.
The problem is that the leap from tutorial case studies to the reference documentation can be quite large! Reference documentation, by its nature, focuses on the individual function call and does not tend to connect the dots. Moreover, reference documentation can easily be overwhelming: it is often difficult to see the relevant information among all the details. This is were, even today, I think it makes sense for a book to include some “core” reference material: the bits and pieces without which one can not effectively use the technology. (Since they are foundational, these bits also do not tend to change frequently, and there is little worry for a printed version to go out of date.)
The other problem is that seeing a bunch of case studies does not teach me how to solve problems in general: all they teach me is how to solve comparable or parallel problems by analogy. But they don’t teach me how to approach a new problem in general. In regards to the book reviewed here: I think I now have a good sense for the pieces that constitute eBPF, but I feel somewhat at a loss how I might employ any of them to a problem of my choosing.
Nevertheless, I found “Learning eBPF” not only a valuable, but also a pleasant (entertaining and stimulating) read. As a starting point on the technology, I recommend it whole-heartedly!