Some things about getaddrinfo that surprised me
Hello! Here are some things you may or may not have noticed about DNS:
- when you resolve a DNS name in a Python program, it checks
/etc/hosts
, but when you usedig
, it doesn’t. - switching Linux distributions can sometimes change how your DNS works, for example if you use Alpine Linux instead of Ubuntu it can cause problems.
- Mac OS has DNS caching, but Linux doesn’t necessarily unless you use
systemd-resolved
or something
To understand all of these, we need to learn about a function called
getaddrinfo
which is responsible for doing DNS lookups.
There are a bunch of surprising-to-me things about getaddrinfo
, and once I
learned about them, it explained a bunch of the confusing DNS behaviour I’d
seen in the past.
where does getaddrinfo
come from?
getaddrinfo
is part of a library called libc
which is the standard C
library. There are at least 3 versions of libc:
- glibc (GNU libc)
- musl libc
- the Mac OS version of libc (I don’t know if this has a name)
There are definitely more (I assume FreeBSD and OpenBSD each have their own version for example), but those are the 3 I know about.
Each of those have their own version of getaddrinfo
.
not all programs use getaddrinfo
for DNS
The first thing I found surprising is that getaddrinfo
is very widely used
but not universally used.
Every program has basically 2 options:
- use
getaddrinfo
. I think that Python, Ruby, and Node usegetaddrinfo
, as well as Go sometimes. Probably many more languages too but I did not have the time to go hunting through every language’s DNS library. - use a custom DNS resolver function. Examples of this:
- dig. I think this is because dig needs more control over the DNS query
than
getaddrinfo
supports so it implements its own DNS logic. - Go also has a pure-Go DNS resolver if you don’t want to use CGo
- There’s a Ruby gem with a custom DNS resolver that you can use to replace
getaddrinfo
. getaddrinfo
doesn’t support DNS over HTTPS, so I assume that browsers that use DoH are not usinggetaddrinfo
for those DNS lookups- probably lots more that I’m not aware of
- dig. I think this is because dig needs more control over the DNS query
than
you’ll sometimes see getaddrinfo
in your DNS error messages
Because getaddrinfo
is so widely used, you’ll often see it in error messages related to DNS.
For example if I run this Python program which looks up nonexistent domain name:
import requests
requests.get("http://xyxqqx.com")
I get this error message:
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
I think socket.getaddrinfo
is calling libc getaddrinfo
somewhere under the
hood, though I did not read all of the source code to check.
Before you learn what getaddrinfo
is, it’s not at all obvious that
socket.gaierror: [Errno -2] Name or service not known
means “that domain
doesn’t exist”. It doesn’t even say the words “DNS” or “domain” in it
anywhere!
getaddrinfo
on Mac doesn’t use /etc/resolv.conf
I used to use a Mac for work, and I always felt vaguely unsettled by DNS on Mac. I could tell that something was different from how it worked on my Linux machine, but I couldn’t figure out what it was.
I still don’t totally understand this and it’s hard for me to investigate because I don’t currently have access to a Mac but here’s what I’ve gathered so far.
On Linux systems, getaddrinfo
decides which DNS resolver to talk to using a
file called /etc/resolv.conf
. (there’s apparently some additional
complexity with /etc/nsswitch.conf
but I have never looked at
/etc/nsswitch.conf
so I’m going to ignore it).
For example, this is the contents of my /etc/resolv.conf
right now:
# Generated by NetworkManager
nameserver 192.168.1.1
nameserver fd13:d987:748a::1
This means that to make DNS queries, getaddrinfo
makes a request to
192.168.1.1
on port 53. That’s my router’s DNS resolver.
I assumed this was getaddrinfo
on Mac also just used /etc/resolv.conf
but I was wrong.
Instead, getaddrinfo
makes a request to a program called mDNSResponder
which is a Mac thing.
I don’t know much about mDNSResponder
except that it does DNS caching and
that apparently you can clear the cache with dscacheutil
. This explains one
of the mysteries at the beginning of the post – why Macs have DNS caching and
Linux machines don’t always.
musl libc getaddrinfo
is different from glibc’s version
You might think ok, Mac OS getaddrinfo
is different, but the two versions of
getaddrinfo
in glibc and musl libc must be mostly the same, right?
But they have some pretty significant differences. The main difference I know about is that musl libc does not support TCP DNS. I couldn’t find anything in the documentation about it but it’s mentioned in this tweet)
I talked a bit more about this TCP DNS thing in ways DNS can break.
Some more differences:
- the way search domains (in
/etc/resolv.conf
) are handled is slightly different (discussed here) - this post mentions that musl doesn’t support nsswitch.conf. I have never used nsswitch.conf and I’m not sure why it’s useful but I think there are reasons I don’t know about.
more weird things: nscd?
When looking up getaddrinfo I also found this interesting post about getaddrinfo from James Fisher that
straces glibc getaddrinfo
and discovers that apparently calls some
program called nscd
which is supposed to do DNS caching. That blog post
describes nscd as “unstable” and “badly designed” and it’s not clear to me how
widely used it is.
I don’t know anything about nscd but I checked and apparently it’s on my computer. I tried it out and this is what happened:
$ nscd
child exited with status 4
My impression is that people who want to do DNS caching on Linux are more
likely to use a DNS forwarder like dnsmasq
or systemd-resolved
instead of
something like nscd
– that’s what I’ve seen in the past.
that’s all!
When I first learned about all of this I found it really surprising that such a widely used library function has such different behaviour on different platforms.
I mean, it makes sense that the people who built Mac OS would want to handle
DNS caching in a different way than it’s handled on Linux, so it’s reasonable
that they implemented getaddrinfo
differently. And it makes sense that some
programs choose not to use getaddrinfo
to make DNS queries.
But it definitely makes DNS a bit more difficult to reason about.