Profiler adventures: resolving symbol addresses is hard!
The other day I posted How does gdb call functions?. In that post I said:
Using the symbol table to figure out the address of the function you want to call is pretty straightforward
Unsurprisingly, it turns out that figuring out the address in memory corresponding to a given symbol is actually not really that straightforward. This is actually something I’ve been doing in my profiler, and I think it’s interesting, so I thought I’d write about it!
Basically the problem I’ve been trying to solve is – I have a symbol (like ruby_api_version
), and
I want to figure out which address that symbol is mapped to in my target process’s memory (so that I
can get the data in it, like the Ruby process’s Ruby version). So far I’ve run into (and fixed!) 3 issues
when trying to do this:
- When binaries are loaded into memory, they’re loaded at a random address (so I can’t just read the symbol table)
- The symbol I want isn’t necessary in the “main” binary (
/proc/PID/exe
, sometimes it’s in some other dynamically linked library) - I need to look at the ELF program header to adjust which address I look at for the symbol
I’ll start with some background, and then explain these 3 things! (I actually don’t know what gdb does)
what’s a symbol?
Most binaries have functions and variables in them. For instance, Perl has a global variable called
PL_bincompat_options
and a function called Perl_sv_catpv_mg
.
Sometimes binaries need to look up functions from another binary (for example, if the binary is a dynamically linked library, you need to look up its functions by name). Also sometimes you’re debugging your code and you want to know what function an address corresponds to.
Symbols are how you look up functions / variables in a binary. They’re in a section called the “symbol table”. The symbol table is basically an index for your binary! Sometimes they’re missing (“stripped”). There are a lot of binary formats, but this post is just about the usual binary format on Linux: ELF.
how do you get the symbol table of a binary?
A thing that I learned today (or at least learned and then forgot) is that there are 2 possible
sections symbols can live in: .symtab
and .dynsym
. .dynsym
is the “dynamic symbol table”.
According to this page, the dynsym is a smaller version of the symtab that only contains global symbols.
There are at least 3 ways to read the symbol table of a binary on Linux: you can use nm, objdump, or readelf.
- read the .symtab:
nm $FILE
,objdump --syms $FILE
,readelf -a $FILE
- read the .dynsym:
nm -D $FILE
,objdump --dynamic-syms $FILE
,readelf -a $FILE
readelf -a
is the same in both cases because readelf -a
just shows you everything in an ELF
file. It’s my favorite because I don’t need to guess where the information I want is, I can just
print out everything and then use grep.
Here’s an example of some of the symbols in /usr/bin/perl
. You can see that each symbol has a
name, a value, and a type. The value is basically the offset of the code/data
corresponding to that symbol in the binary. (except some symbols have value 0. I think that has
something to do with dynamic linking but I don’t understand it so we’re not going to get into it)
$ readelf -a /usr/bin/perl
...
Num: Value Size Type Ndx Name
523: 00000000004d6590 49 FUNC 14 Perl_sv_catpv_mg
524: 0000000000543410 7 FUNC 14 Perl_sv_copypv
525: 00000000005a43e0 202 OBJECT 16 PL_bincompat_options
526: 00000000004e6d20 2427 FUNC 14 Perl_pp_ucfirst
527: 000000000044a8c0 1561 FUNC 14 Perl_Gv_AMupdate
...
the question we want to answer: what address is a symbol mapped to?
That’s enough background!
Now – suppose I’m a debugger, and I want to know what address the ruby_api_version
symbol is
mapped to. Let’s use readelf to look at the relevant Ruby binary!
readelf -a ~/.rbenv/versions/2.1.6/bin/ruby | grep ruby_api_version
365: 00000000001f9180 12 OBJECT GLOBAL DEFAULT 15 ruby_api_version
Neat! The offset of ruby_api_version
is 0x1f9180
. We’re done, right? Of course not! :)
Problem 1: ASLR (Address space layout randomization)
Here’s the first issue: when Linux loads a binary into memory (like
~/.rbenv/versions/2.1.6/bin/ruby
), it doesn’t just load it at the 0
address. Instead, it usually
adds a random offset. Wikipedia’s article on ASLR explains why:
Address space layout randomization (ASLR) is a memory-protection process for operating systems (OSes) that guards against buffer-overflow attacks by randomizing the location where system executables are loaded into memory.
We can see this happening in practice: I started /home/bork/.rbenv/versions/2.1.6/bin/ruby
3 times
and every time the process gets mapped to a different place in memory. (0x56121c86f000
, 0x55f440b43000
, 0x56163334a000
)
Here we’re meeting our good friend /proc/$PID/maps
– this file contains a list of memory maps for
a process. The memory maps tell us every address range in the process’s virtual memory (it turns out
virtual memory isn’t contiguous! Instead process get a bunch of possibly-disjoint memory maps!).
This file is so useful! You can find the address of the stack, the heap, every dynamically loaded
library, anonymous memory maps, and probably more.
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
56121c86f000-56121caf0000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
56121ccf0000-56121ccf5000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
56121ccf5000-56121ccf7000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
55f440b43000-55f440dc4000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
55f440fc4000-55f440fc9000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
55f440fc9000-55f440fcb000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
56163334a000-5616335cb000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
5616337cb000-5616337d0000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
5616337d0000-5616337d2000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
Okay, so in the last example we see that our binary is mapped at 0x56163334a000
. If we combine
this with the knowledge that ruby_api_version
is at 0x1f9180
, then that means that we just need
to look that the address 0x1f9180 + 0x56163334a000
to find our variable, right?
Yes! In this case, that works. But in other cases it won’t! So that brings us to problem 2.
Problem 2: dynamically loaded libraries
Next up, I tried running system Ruby: /usr/bin/ruby
. This binary has basically no symbols at all!
Disaster! In particular it does not have a ruby_api_version
symbol.
But when I tried to print the ruby_api_version
variable with gdb, it worked!!! Where was gdb
finding my symbol? I found the answer with the help of our good friend: /proc/PID/maps
It turns out that /usr/bin/ruby
dynamically loads a library called libruby-2.3
. You can see it
in the memory maps here:
$ cat /proc/(pgrep -f /usr/bin/ruby)/maps | grep libruby
7f2c5d789000-7f2c5d9f1000 r-xp 00000000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5d9f1000-7f2c5dbf0000 ---p 00268000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5dbf0000-7f2c5dbf6000 r--p 00267000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5dbf6000-7f2c5dbf7000 rw-p 0026d000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
And if we read it with readelf
, we find the address of that symbol!
readelf -a /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0 | grep ruby_api_version
374: 00000000001c72f0 12 OBJECT GLOBAL DEFAULT 13 ruby_api_version
So in this case the address of the symbol we want is 0x7f2c5d789000
(the start of the libruby-2.3
memory map) plus 0x1c72f0
. Nice! But we’re still not done. There is (at least) one more mystery!
Problem 3: the vaddr
offset in the ELF program header
This one I just figured out today so it’s the one I have the shakiest understanding of. Here’s what happened.
I was running system ruby on Ubuntu 14.04: Ruby 1.9.3. And my usual code (find the libruby map, get its address, get the symbol offset, add them up) wasn’t working!!! I was confused.
But I’d asked Julian if he knew of any weird stuff I need to worry about a while back and he said “well,
you should read the code for dlsym
, you’re trying to do basically the same thing”. So I decided
to, instead of randomly guessing, go read the code for dlsym
.
The man page for dlsym
says “dlsym, dlvsym - obtain address of a symbol in a shared object or
executable”. Perfect!!
Here’s the dlsym code from musl I read. (musl is like glibc, but, different. Maybe easier to read? I don’t understand it that well.)
The dlsym code says (on line 1468) return def.dso->base + def.sym->st_value;
That sounds like what
I’m doing!! But what’s dso->base
? It looks like base = map - addr_min;
, and addr_min = ph->p_vaddr;
. (there’s also some stuff that makes sure addr_min
is aligned with the page size which I should maybe pay attention to.)
So the code I want is something like map_base - ph->p_vaddr + sym->st_value
.
I looked up this vaddr
thing in the ELF program header, subtracted it from my calculation, and
voilĂ ! It worked!!!
there are probably more problems!
I imagine I will discover even more ways that I am calculating the symbol address wrong. It’s interesting that such a seemingly simple thing (“what’s the address of this symbol?”) is so complicated!
It would be nice to be able to just call dlsym
and have it do all the right calculations for me,
but I think I can’t because the symbol is in a different process. Maybe I’m wrong about that though!
I would like to be wrong about that. If you know an easier way to do all this I would very much like
to know! Someone on Reddit linked to this interesting gist they wrote which seems to also reimplement dlsym.