How does gdb work?

Hello! Today I was working a bit on my ruby stacktrace project and I realized that now I know a couple of things about how gdb works internally.

Lately I’ve been using gdb to look at Ruby programs, so we’re going to be running gdb on a Ruby program. This really means the Ruby interpreter. First, we’re going to print out the address of a global variable: ruby_current_thread:

getting a global variable

Here’s how to get the address of the global ruby_current_thread:

$ sudo gdb -p 2983
(gdb) p & ruby_current_thread
$2 = (rb_thread_t **) 0x5598a9a8f7f0 <ruby_current_thread>

There are a few places a variable can live: on the heap, the stack, or in your program’s text. Global variables are part of your program! You can think of them as being allocated at compile time, kind of. It turns out we can figure out the address of a global variable pretty easily! Let’s see how gdb came up with 0x5598a9a8f7f0.

We can find the approximate region this variable lives in by looking at a cool file in /proc called /proc/$pid/maps.

$ sudo cat /proc/2983/maps | grep bin/ruby
5598a9605000-5598a9886000 r-xp 00000000 00:32 323508                     /home/bork/.rbenv/versions/2.1.6/bin/ruby
5598a9a86000-5598a9a8b000 r--p 00281000 00:32 323508                     /home/bork/.rbenv/versions/2.1.6/bin/ruby
5598a9a8b000-5598a9a8d000 rw-p 00286000 00:32 323508                     /home/bork/.rbenv/versions/2.1.6/bin/ruby

So! There’s this starting address 5598a9605000 That’s like 0x5598a9a8f7f0, but different. How different? Well, here’s what I get when I subtract them:

(gdb) p/x 0x5598a9a8f7f0 - 0x5598a9605000
$4 = 0x48a7f0

“What’s that number?”, you might ask? WELL. Let’s look at the symbol table for our program with nm.

sudo nm /proc/2983/exe | grep ruby_current_thread
000000000048a7f0 b ruby_current_thread

What’s that we see? Could it be 0x48a7f0? Yes it is! So!! If we want to find the address of a global variable in our program, all we need to do is look up the name of the variable in the symbol table, and then add that to the start of the range in /proc/whatever/maps, and we’re done!

So now we know how gdb does that. But gdb does so much more!! Let’s skip ahead to…

dereferencing pointers

(gdb) p ruby_current_thread
$1 = (rb_thread_t *) 0x5598ab3235b0

The next thing we’re going to do is dereference that ruby_current_thread pointer. We want to see what’s in that address! To do that, gdb will run a bunch of system calls like this:

ptrace(PTRACE_PEEKTEXT, 2983, 0x5598a9a8f7f0, [0x5598ab3235b0]) = 0

You remember this address 0x5598a9a8f7f0? gdb is asking “hey, what’s in that address exactly”? 2983 is the PID of the process we’re running gdb on. It’s using the ptrace system call which is how gdb does everything.

Awesome! So we can dereference memory and figure out what bytes are at what memory addresses. Some useful gdb commands to know here are x/40w variable and x/40b variable which will display 40 words / bytes at a given address, respectively.

describing structs

The memory at an address looks like this. A bunch of bytes!

(gdb) x/40b ruby_current_thread
0x5598ab3235b0:	16	-90	55	-85	-104	85	0	0
0x5598ab3235b8:	32	47	50	-85	-104	85	0	0
0x5598ab3235c0:	16	-64	-55	115	-97	127	0	0
0x5598ab3235c8:	0	0	2	0	0	0	0	0
0x5598ab3235d0:	-96	-83	-39	115	-97	127	0	0

That’s useful, but not that useful! If you are a human like me and want to know what it MEANS, you need more. Like this:

(gdb) p *(ruby_current_thread)
$8 = {self = 94114195940880, vm = 0x5598ab322f20, stack = 0x7f9f73c9c010,
	stack_size = 131072, cfp = 0x7f9f73d9ada0, safe_level = 0,    raised_flag = 0,
	last_status = 8, state = 0, waiting_fd = -1, passed_block = 0x0,
	passed_bmethod_me = 0x0, passed_ci = 0x0,    top_self = 94114195612680,
	top_wrapper = 0, base_block = 0x0, root_lep = 0x0, root_svar = 8, thread_id =
	140322820187904,

GOODNESS. That is a lot more useful. How does gdb know that there are all these cool fields like stack_size? Enter DWARF. DWARF is a way to store extra debugging data about your program, so that debuggers like gdb can do their job better! It’s generally stored as part of a binary. If I run dwarfdump on my Ruby binary, I get some output like this:

(I’ve redacted it heavily to make it easier to understand)

DW_AT_name                  "rb_thread_struct"
DW_AT_byte_size             0x000003e8
DW_TAG_member
  DW_AT_name                  "self"
  DW_AT_type                  <0x00000579>
  DW_AT_data_member_location  DW_OP_plus_uconst 0
DW_TAG_member
  DW_AT_name                  "vm"
  DW_AT_type                  <0x0000270c>
  DW_AT_data_member_location  DW_OP_plus_uconst 8
DW_TAG_member
  DW_AT_name                  "stack"
  DW_AT_type                  <0x000006b3>
  DW_AT_data_member_location  DW_OP_plus_uconst 16
DW_TAG_member
  DW_AT_name                  "stack_size"
  DW_AT_type                  <0x00000031>
  DW_AT_data_member_location  DW_OP_plus_uconst 24
DW_TAG_member
  DW_AT_name                  "cfp"
  DW_AT_type                  <0x00002712>
  DW_AT_data_member_location  DW_OP_plus_uconst 32
DW_TAG_member
  DW_AT_name                  "safe_level"
  DW_AT_type                  <0x00000066>

So. The name of the type of ruby_current_thread is rb_thread_struct. It has size 0x3e8 (or 1000 bytes), and it has a bunch of member items. stack_size is one of them, at an offset of 24, and it has type 31. What’s 31? No worries! We can look that up in the DWARF info too!

< 1><0x00000031>    DW_TAG_typedef
                      DW_AT_name                  "size_t"
                      DW_AT_type                  <0x0000003c>
< 1><0x0000003c>    DW_TAG_base_type
                      DW_AT_byte_size             0x00000008
                      DW_AT_encoding              DW_ATE_unsigned
                      DW_AT_name                  "long unsigned int"

So! stack_size has type size_t, which means long unsigned int, and is 8 bytes. That means that we can read the stack size!

How that would break down, once we have the DWARF debugging data, is:

Read the region of memory that ruby_current_thread is pointing to
Add 24 bytes to get to stack_size
Read 8 bytes (in little-endian format, since we’re on x86)
Get the answer!

Which in this case is 131072 or 128 kb.

To me, this makes it a lot more obvious what debugging info is for – if we didn’t have all this extra metadata about what all these variables meant, we would have no idea what the bytes at address 0x5598ab3235b0 meant.

This is also why you can install debug info for a program separately from your program – gdb doesn’t care where it gets the extra debug info from.

DWARF is confusing

I’ve been reading a bunch of DWARF info recently. Right now I’m using libdwarf which hasn’t been the best experience – the API is confusing, you initialize everything in a weird way, and it’s really slow (it takes 0.3 seconds to read all the debugging data out of my Ruby program which seems ridiculous). I’ve been told that libdw from elfutils is better.

Also, I casually remarked that you can look at DW_AT_data_member_location to get the offset of a struct member! But I looked up on Stack Overflow how to actually do that and I got this answer. Basically you start with a check like:

dwarf_whatform(attrs[i], &form, &error);
    if (form == DW_FORM_data1 || form == DW_FORM_data2
        form == DW_FORM_data2 || form == DW_FORM_data4
        form == DW_FORM_data8 || form == DW_FORM_udata) {

and then it keeps GOING. Why are there 8 million different DW_FORM_data things I need to check for? What is happening? I have no idea.

Anyway my impression is that DWARF is a large and complicated standard (and possibly the libraries people use to generate DWARF are subtly incompatible?), but it’s what we have, so that’s what we work with!

I think it’s really cool that I can write code that reads DWARF and my code actually mostly works. Except when it crashes. I’m working on that.

unwinding stacktraces

In an earlier version of this post, I said that gdb unwinds stacktraces using libunwind. It turns out that this isn’t true at all!

Someone who’s worked on gdb a lot emailed me to say that they actually spent a ton of time figuring out how to unwind stacktraces so that they can do a better job than libunwind does. This means that if you get stopped in the middle of a weird program with less debug info than you might hope for that’s done something strange with its stack, gdb will try to figure out where you are anyway. Thanks <3

other things gdb does

The few things I’ve described here (reading memory, understanding DWARF to show you structs) aren’t everything gdb does – just looking through Brendan Gregg’s gdb example from yesterday, we see that gdb also knows how to

disassemble assembly
show you the contents of your registers

and in terms of manipulating your program, it can

set breakpoints and step through a program
modify memory (!! danger !!)

Knowing more about how gdb works makes me feel a lot more confident when using it! I used to get really confused because gdb kind of acts like a C REPL sometimes – you type ruby_current_thread->cfp->iseq, and it feels like writing C code! But you’re not really writing C at all, and it was easy for me to run into limitations in gdb and not understand why.

Knowing that it’s using DWARF to figure out the contents of the structs gives me a better mental model and have more correct expectations! Awesome.