How does gdb call functions?

(previous gdb posts: how does gdb work? (2016) and three things you can do with gdb (2014))

I discovered this week that you can call C functions from gdb! I thought this was cool because I’d previously thought of gdb as mostly a read-only debugging tool.

I was really surprised by that (how does that WORK??). As I often do, I asked on Twitter how that even works, and I got a lot of really useful answers! My favorite answer was Evan Klitzke’s example C code showing a way to do it. Code that works is very exciting!

I believe (through some stracing & experiments) that that example C code is different from how gdb actually calls functions, so I’ll talk about what I’ve figured out about what gdb does in this post and how I’ve figured it out.

There is a lot I still don’t know about how gdb calls functions, and very likely some things in here are wrong.

What does it mean to call a C function from gdb?

Before I get into how this works, let’s talk quickly about why I found it surprising / nonobvious.

So, you have a running C program (the “target program”). You want to run a function from it. To do that, you need to basically:

pause the program (because it is already running code!)
find the address of the function you want to call (using the symbol table)
convince the program (the “target program”) to jump to that address
when the function returns, restore the instruction pointer and registers to what they were before

Using the symbol table to figure out the address of the function you want to call is pretty straightforward – here’s some sketchy (but working!) Rust code that I’ve been using on Linux to do that. This code uses the elf crate. If I wanted to find the address of the foo function in PID 2345, I’d run elf_symbol_value("/proc/2345/exe", "foo").

fn elf_symbol_value(file_name: &str, symbol_name: &str) -> Result<u64, Box<std::error::Error>> {
    // open the ELF file
    let file = elf::File::open_path(file_name).ok().ok_or("parse error")?;
    // loop over all the sections & symbols until you find the right one!
    let sections = &file.sections;
    for s in sections {
        for sym in file.get_symbols(&s).ok().ok_or("parse error")? {
            if sym.name == symbol_name {
                return Ok(sym.value);
            }
        }
    }
    None.ok_or("No symbol found")?
}

This won’t totally work on its own, you also need to look at the memory maps of the file and add the symbol offset to the start of the place that file is mapped. But finding the memory maps isn’t so hard, they’re in /proc/PID/maps.

Anyway, this is all to say that finding the address of the function to call seemed straightforward to me but that the rest of it (change the instruction pointer? restore the registers? what else?) didn’t seem so obvious!

You can’t just jump

I kind of said this already but – you can’t just find the address of the function you want to run and then jump to that address. I tried that in gdb (jump foo) and the program segfaulted. Makes sense!

How you can call C functions from gdb

First, let’s see that this is possible. I wrote a tiny C program that sleeps for 1000 seconds and called it test.c:

#include <unistd.h>

int foo() {
    return 3;
}
int main() {
    sleep(1000);
}

Next, compile and run it:

$ gcc -o test  test.c
$ ./test

Finally, let’s attach to the test program with gdb:

$ sudo gdb -p $(pgrep -f test)
(gdb) p foo()
$1 = 3
(gdb) quit

So I ran p foo() and it ran the function! That’s fun.

Why is this useful?

a few possible uses for this:

it lets you treat gdb a little bit like a C REPL, which is fun and I imagine could be useful for development
utility functions to display / navigate complex data structures quickly while debugging in gdb (thanks @invalidop)
set an arbitrary process’s namespace while it’s running (featuring a not-so-surprising appearance from my colleague nelhage!)
probably more that I don’t know about

How it works

I got a variety of useful answers on Twitter when I asked how calling functions from gdb works! A lot of them were like “well you get the address of the function from the symbol table” but that is not the whole story!!

One person pointed me to this nice 2 part series on how gdb works that they’d written: Debugging with the natives, part 1 and Debugging with the natives, part 2. Part 1 explains approximately how calling functions works (or could work – figuring out what gdb actually does isn’t trivial, but I’ll try my best!).

The steps outlined there are:

Stop the process
Create a new stack frame (far away from the actual stack)
Save all the registers
Set the registers to the arguments you want to call your function with
Set the stack pointer to the new stack frame
Put a trap instruction somewhere in memory
Set the return address to that trap instruction
Set the instruction pointer register to the address of the function you want to call
Start the process again!

I’m not going to go through how gdb does all of these (I don’t know!) but here are a few things I’ve learned about the various pieces this evening.

Create a stack frame

If you’re going to run a C function, most likely it needs a stack to store variables on! You definitely don’t want it to clobber your current stack. Concretely – before gdb calls your function (by setting the instruction pointer to it and letting it go), it needs to set the stack pointer to… something.

There was some speculation on Twitter about how this works:

i think it constructs a new stack frame for the call right on top of the stack where you’re sitting!

and:

Are you certain it does that? It could allocate a pseudo stack, then temporarily change sp value to that location. You could try, put a breakpoint there and look at the sp register address, see if it’s contiguous to your current program register?

I did an experiment where (inside gdb) I ran:`

(gdb) p $rsp
$7 = (void *) 0x7ffea3d0bca8
(gdb) break foo
Breakpoint 1 at 0x40052a
(gdb) p foo()
Breakpoint 1, 0x000000000040052a in foo ()
(gdb) p $rsp
$8 = (void *) 0x7ffea3d0bc00

This seems in line with the “gdb constructs a new stack frame for the call right on top of the stack where you’re sitting” theory, since the stack pointer ($rsp) goes from being ...bca8 to ..bc00 – stack pointers grow downward, so a bc00 stack pointer is after a bca8 pointer. Interesting!

So it seems like gdb just creates the new stack frames right where you are. That’s a bit surprising to me!

change the instruction pointer

Let’s see whether gdb changes the instruction pointer!

(gdb) p $rip
$1 = (void (*)()) 0x7fae7d29a2f0 <__nanosleep_nocancel+7>
(gdb) b foo
Breakpoint 1 at 0x40052a
(gdb) p foo()
Breakpoint 1, 0x000000000040052a in foo ()
(gdb) p $rip
$3 = (void (*)()) 0x40052a <foo+4>

It does! The instruction pointer changes from 0x7fae7d29a2f0 to 0x40052a (the address of the foo function).

I stared at the strace output and I still don’t understand how it changes, but that’s okay.

aside: how breakpoints are set!!

Above I wrote break foo. I straced gdb while running all of this and understood almost nothing but I found ONE THING that makes sense to me!!

Here are some of the system calls that gdb uses to set a breakpoint. It’s really simple! It replaces one instruction with cc (which https://defuse.ca/online-x86-assembler.htm tells me means int3 which means send SIGTRAP), and then once the program is interrupted, it puts the instruction back the way it was.

I was putting a breakpoint on a function foo with the address 0x400528.

This PTRACE_POKEDATA is how gdb changes the code of running programs.

// change the 0x400528 instructions
25622 ptrace(PTRACE_PEEKTEXT, 25618, 0x400528, [0x5d00000003b8e589]) = 0
25622 ptrace(PTRACE_POKEDATA, 25618, 0x400528, 0x5d00000003cce589) = 0
// start the program running
25622 ptrace(PTRACE_CONT, 25618, 0x1, SIG_0) = 0
// get a signal when it hits the breakpoint
25622 ptrace(PTRACE_GETSIGINFO, 25618, NULL, {si_signo=SIGTRAP, si_code=SI_KERNEL, si_value={int=-1447215360, ptr=0x7ffda9bd3f00}}) = 0
// change the 0x400528 instructions back to what they were before
25622 ptrace(PTRACE_PEEKTEXT, 25618, 0x400528, [0x5d00000003cce589]) = 0
25622 ptrace(PTRACE_POKEDATA, 25618, 0x400528, 0x5d00000003b8e589) = 0

put a trap instruction somewhere

When gdb runs a function, it also puts trap instructions in a bunch of places! Here’s one of them (per strace). It’s basically replacing one instruction with cc (int3).

5908  ptrace(PTRACE_PEEKTEXT, 5810, 0x7f6fa7c0b260, [0x48f389fd89485355]) = 0
5908  ptrace(PTRACE_PEEKTEXT, 5810, 0x7f6fa7c0b260, [0x48f389fd89485355]) = 0
5908 ptrace(PTRACE_POKEDATA, 5810, 0x7f6fa7c0b260, 0x48f389fd894853cc) = 0

What’s 0x7f6fa7c0b260? Well, I looked in the process’s memory maps, and it turns it’s somewhere in /lib/x86_64-linux-gnu/libc-2.23.so. That’s weird! Why is gdb putting trap instructions in libc?

Well, let’s see what function that’s in. It turns out it’s __libc_siglongjmp. The other functions gdb is putting traps in are __longjmp, ____longjmp_chk, dl_main, and _dl_close_worker.

Why? I don’t know! Maybe for some reason when our function foo() returns, it’s calling longjmp, and that is how gdb gets control back? I’m not sure.

how gdb calls functions is complicated!

I’m going to stop there (it’s 1am!), but now I know a little more!

It seems like the answer to “how does gdb call a function?” is definitely not that simple. I found it interesting to try to figure a little bit of it out and hopefully you have too!

I still have a lot of unanswered questions about how exactly gdb does all of these things, but that’s okay. I don’t really need to know the details of how this works and I’m happy to have a slightly improved understanding.