Julia Evans

Writing eBPF tracing tools in Rust

tl;dr: I made an experimental Rust repository that lets you write BPF tracing tools from Rust! It’s at https://github.com/jvns/rust-bcc or https://crates.io/crates/bcc, and has a couple of hopefully easy to understand examples. It turns out that writing BPF-based tracing tools in Rust is really easy (in some ways easier than doing the same things in Python). In this post I’ll explain why I think this is useful/important.

For a long time I’ve been interested in the BPF compiler collection, a C -> BPF compiler, C library, and Python bindings to make it easy to write tools like:

  • opensnoop (spies on which files are being opened)
  • tcplife (track length of TCP connections)
  • cpudist (count how much time every program spends on- and off-CPU)

and a lot more. The list of available tools in the /tools directory is really impressive and I could write a whole blog post about that. If you’re familiar with dtrace – the idea is that BCC is a little bit like dtrace, and in fact there’s a dtrace-like language named ply implemented with BPF.

This blog post isn’t about ply or the great BCC tools though – it’s about what tools we need to build more complicated/powerful BPF-based programs.

What does the BPF compiler collection let you do?

Here’s a quick overview of what BCC lets you do:

  • compile BPF programs from C into eBPF bytecode.
  • attach this eBPF bytecode to a userspace function or kernel function (as a “uprobe” / “kprobe”) or install it as XDP
  • communicate with the eBPF bytecode to get information with it

A basic example of using BCC is this strlen_count.py program and I think it’s useful to look at this program to understand how BCC works and how you might be able to implement more advanced tools.

First, there’s an eBPF program. This program is going to be attached to the strlen function from libc (the C standard library) – every time we call strlen, this code will be run.

This eBPF program

  • gets the first argument to the strlen function (the address of a string)
  • reads the first 80 characters of that string (using bpf_probe_read)
  • increments a counter in a hashmap (basically counts[str] += 1)

The result is that you can count every call to strlen. Here’s the eBPF program:

struct key_t {
    char c[80];
};
BPF_HASH(counts, struct key_t);
int count(struct pt_regs *ctx) {
    if (!PT_REGS_PARM1(ctx))
        return 0;
    struct key_t key = {};
    u64 zero = 0, *val;
    bpf_probe_read(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
    val = counts.lookup_or_init(&key, &zero);
    (*val)++;
    return 0;
};

After that program is compiled, there’s a Python part which does b.attach_uprobe(name="c", sym="strlen", fn_name="count") – it tells the Linux kernel to actually attach the compiled BPF to the strlen function so that it runs every time strlen runs.

The really exciting thing about eBPF is what comes next – there’s no use keeping a hashmap of string counts if you can’t access it! BPF has a number of data structures that let you share information between BPF programs (that run in the kernel / in uprobes) and userspace.

So in this case the Python program accesses this counts data structure.

BPF data structures: hashmaps, buffers, and more!

There’s a great list of available BPF data structures in the BCC reference guide.

There are basically 2 kinds of BPF data structures – data structures suitable for storing statistics (BPF_HASH, BPF_HISTOGRAM etc), and data structures suitable for storing events (like BPF_PERF_MAP) where you send a stream of events to a userspace program which then displays them somehow.

There are a lot of interesting BPF data structures (like a trie!) and I haven’t fully worked out what all of the possibilities are with them yet :)

What I’m interested in: BPF for profiling & tracing

Okay!! We’re done with the background, let’s talk about why I’m interested in BCC/BPF right now.

I’m interested in using BPF to implement profiling/tracing tools for dynamic programming languages, specifically tools to do things like “trace all memory allocations in this Ruby program”. I think it’s exciting that you can say “hey, run this tiny bit of code every time a Ruby object is allocated” and get data back about ongoing allocations!

Rust: a way to build more powerful BPF-based tools

The issue I see with the Python BPF libraries (which are GREAT, of course) is that while they’re perfect for building tools like tcplife which track tcp connnection lengths, once you want to start doing more complicated experiments like “stream every memory allocation from this Ruby program, calculate some metadata about it, query the original process to find out the class name for that address, and display a useful summary”, Python doesn’t really cut it.

So I decided to spend 4 days trying to build a BCC library for Rust that lets you attach + interact with BPF programs from Rust!

Basically I worked on porting https://github.com/iovisor/gobpf (a go BCC library) to Rust.

The easiest and most exciting way to explain this is to show an example of what using the library looks like.

Rust example 1: strlen

Let’s start with the strlen example from above. Here’s strlen.rs from the examples!

Compiling & attaching the strlen code is easy:

let mut module = BPF::new(code)?;
let uprobe_code = module.load_uprobe("count")?;
module.attach_uprobe("/lib/x86_64-linux-gnu/libc.so.6", "strlen", uprobe_code, -1 /* all PIDs */)?;
let table = module.table("counts");

This table contains a hashmap mapping strings to counts. So we need to iterate over that table and print out the keys and values. This is pretty simple: it looks like this.

let iter = table.into_iter();
for e in iter {
    // key and value are each a Vec<u8> so we need to transform them into a string and 
    // a u64 respectively
    let key = get_string(&e.key);
    let value = Cursor::new(e.value).read_u64::<NativeEndian>().unwrap();
    println!("{:?} {:?}", key, value);
}

Basically all the data that comes out of a BPF program is an opaque Vec<u8> right now, so you need to figure out how to decode them yourself. Luckily decoding binary data is something that Rust is quite good at – the byteorder crate lets you easily decode u64s, and translating a vector of bytes into a String is easy (I wrote a quick get_string helper function to do that).

I thought this was really nice because the code for this program in Rust is basically exactly the same as the corresponding Python version. So it very pretty approachable to start doing experiments and seeing what’s possible.

Reading perf events from Rust

The next thing I wanted to do after getting this strlen example to work in rust was to handle events!!

Events are a little different / more complicated. The way you stream events in a BCC program is – it uses perf_event_open to create a ring buffer where the events get stored.

Dealing with events from a perf ring buffer normally is a huge pain because perf has this complicated data structure. The C BCC library makes this easier for you by letting you specify a C callback that gets called on every new event, and it handles dealing with perf. This is super helpful. To make this work with Rust, the rust-bcc library lets you pass in a Rust closure to run on every event.

Rust example 2: opensnoop.rs (events!!)

To make sure reading BPF events actually worked, I implemented a basic version of opensnoop.py from the iovisor bcc tools: opensnoop.rs.

I won’t walk through the C code in this case because there’s a lot of it but basically the eBPF C part generates an event every time a file is opened on the system. I copied the C code verbatim from opensnoop.py.

Here’s the type of the event that’s generated by the BPF code:

#[repr(C)]
struct data_t {
    id: u64, // pid + thread id
    ts: u64,
    ret: libc::c_int,
    comm: [u8; 16], // process name
    fname: [u8; 255], // filename
}

The Rust part starts out by compiling BPF code & attaching kprobes (to the open system call in the kernel, do_sys_open). I won’t paste that code here because it’s basically the same as the strlen example. What happens next is the new part: we install a callback with a Rust closure on the events table, and then call perf_map.poll(200) in a loop. The design of the BCC library is a little confusing to me still, but you need to repeatedly poll the perf reader objects to make sure that the callbacks you installed actually get called.

let table = module.table("events");
let mut perf_map = init_perf_map(table, perf_data_callback)?;
loop {
    perf_map.poll(200);
}

This is the callback code I wrote, that gets called every time. Again, it takes an opaque Vec<u8> event and translates it into a data_t struct to print it out. Doing this is kind of annoying (I actually called libc::memcpy which is Not Encouraged Rust Practice), I need to figure out a less gross/unsafe way to do that. The really nice thing is that if you put #[repr(C)] on your Rust structs it represents them in memory the exact same way C will represent that struct. So it’s quite easy to share data structures between Rust and C.

fn perf_data_callback() -> Box<Fn(Vec<u8>)> {
    Box::new(|x| {
        // This callback
        let data = parse_struct(&x);
        println!("{:-7} {:-16} {}", data.id >> 32, get_string(&data.comm), get_string(&data.fname));
    })
}

You might notice that this is actually a weird function that returns a callback – this is because I needed to install 4 callbacks (1 per CPU), and in stable Rust you can’t copy closures yet.

output

Here’s what the output of that opensnoop program looks like!

This is kind of meta – these are the files that were being opened on my system when I saved this blog post :). You can see that git is looking at some files, vim is saving a file, and my static site generator Hugo is opening the changed file so that it can update the site. Neat!

PID     COMMAND          FILENAME
   8519 git              /home/bork/work/homepage/.gitmodules
   8519 git              /home/bork/.gitconfig
   8519 git              .git/config
  22877 vim              content/post/2018-02-05-writing-ebpf-programs-in-rust.markdown
  22877 vim              .
   7312 hugo             /home/bork/work/homepage/content/post/2018-02-05-writing-ebpf-programs-in-rust.markdown
   7312 hugo             /home/bork/work/homepage/content/post/2018-02-05-writing-ebpf-programs-in-rust.markdown

using rust-bcc to implement Ruby experiments

Now that I have this basic library that I can use I can get counts + stream events in Rust, I’m excited about doing some experiments with making BCC programs in Rust that talk to Ruby programs!

The first experiment (that I blogged about last week) is count-ruby-allocs.rs which prints out a live count of current allocation activity. Here’s an example of what it prints out: (the numbers are counts of the number of objects allocated of that type so far).

      RuboCop::Token 53
      RuboCop::Token 112
           MatchData 246
Parser::Source::Rang 255
                Proc 323
          Enumerator 328
                Hash 475
               Range 1210
                 ??? 1543
              String 3410
               Array 7879
Total allocations since we started counting: 16932
Allocations this second: 954

Geoffrey Couprie is interested in building more advanced BPF tracing tools with Rust too and wrote a great blog post with a cool proof of concept: Compiling to eBPF from Rust.

I think the idea of not requiring the user to compile the BPF program is exciting, because you could imagine distributing a statically linked Rust binary (which links in libcc.so) with a pre-compiled BPF program that the binary just installs and then uses to do cool stuff.

Also there’s another Rust BCC library at https://bitbucket.org/photoszzt/rust-bpf/ at which has a slightly different set of capabilities than jvns/rust-bcc (going to spend some time looking at that one later, I just found about it like 30 minutes ago :)).

that’s it for now

This crate is still extremely sketchy and there are bugs & missing features but I wanted to put it on the internet because I think the examples of what you can do with it are really exciting!!

Spying on a Ruby process's memory allocations with eBPF Profiler week 5: Mac support, experiments profiling memory allocations