I have a long-standing project to try to learn to use ftrace, a Linux kernel tracing tool. As usual when I want to learn more, I turned to one of Brendan Gregg's tools -- the perf-tools repo on Github. There's a whole delightful directory of examples. It's the best.
He has a lot of tools that are ready to use and easy to understand -- for instance, you can use execsnoop to see every program that's being executed. Those are awesome.
Then there are some that require... more work. I was interested in ftrace, and I was using his kprobe script to trace system calls and their arguments.
I started out by running
sudo kernel/kprobe 'p:SyS_read' to tell me every time the
read system call happened. This gives me output like
But WHAT FILE DID IT READ? This is not good at all.
In the example, he says, on the open system call.
Here I guessed that the mode was in register %cx, and cast it as a 16-bit unsigned integer (":u16"). Your platform and kernel may be different, and the mode may be in a different register. If fiddling with such registers becomes too painful or unreliable for you, consider installing kernel debuginfo and using the named variables with perf_events "perf probe".
This was wizardry. How could he guess that the mode of a file was in the register
%cx? What even are the registers? This makes no sense.
I partly figured this out and got more information about the
read system calls, so I will now tell you!
what even is a register
I know that registers are what the CPU uses to store data in when calculating things. But how many even are there? How do I guess which one is right?
First, I found this page describing x86 registers. It tells me that there are
From the description, the segment registers seem safe to ignore! Awesome. The instruction pointer and the stack pointer tell me what instruction is running right now and where the stack is. I also don't care about that. So that leaves me with only 7 registers to worry about (eax, ebx, ecx, edx, esi, edi, and ebp). That's way better.
printing the registers
So before we were running
sudo kernel/kprobe 'p:SyS_read'. We can also print the registers for the read system call! Here goes. For some reason we need to take off the
Let's compare this to the output of strace:
Ok, awesome! In the output of
strace. I know that
9 is the file descriptor, 2 is the length to read, and the middle value is the string. This must mean that
%di is the file descriptor, and
%dx is the amount of data to read!
I can label those now and be a register-guessing-wizard like Brendan Gregg!
So now I know which file descriptors are being read!
The advantage of using ftrace instead of strace is that the overhead is way lower: when I strace
find it makes it 20x slower, but with ftrace it's totally okay. I'm still not sure where the string that we read is (I think it's in
Now I am one step closer to being able to trace system calls with less overhead. Guessing registers is really tedious but it seems to be totally possible!
update: turns out you don't have to guess at all! the registers used for system calls are always the same :D. here is a table with all the answers ❤