How do HTTP requests get sent to the right place?
10 years ago I configured the Apache webserver for the first time. I remember having to set up a “virtual host” to have more than one site running on the same machine, and I remember that I totally got it to work, but did not understand at all what was happening.
Fast forward 8 years. One day I was reading High Performance Browser Networking, which is an great book and you should read it. (in particular, there’s a really clear and fascinating explanation of WebRTC).
And then I was talking to someone about it and they were like “hmm, so how do apache virtual hosts work?” and I was like “hmmmmmm.” AND SUDDENLY I KNEW AND IT WAS OBVIOUS.
how to make a HTTP request from scratch
The difference between 17-year old Julia and 25-year-old Julia is that 25-year-old Julia knew how HTTP works.
When I’m experimenting with HTTP, sometimes I like to make HTTP requests manually (without using any tools like curl). It turns out this is easy!
First, run curl -v http://ask.metafilter.com > /dev/null
This will tell you at the beginning what request it sent. Then you put that into a file (the blank lines at the end are important):
$ cat request.txt
GET / HTTP/1.1
Host: ask.metafilter.com
User-Agent: curl/7.47.0
Then you use netcat to send the request to ask.metafilter.com!
cat request.txt | netcat ask.metafilter.com 80
# This also works: this is the IP for ask.metafilter.com
cat request.txt | netcat 54.186.13.33 80
If you remove the Host: part from this request, it does not work. Zero. Apache is all “400 Bad request”. 400 means it is your fault.
the Host header
So! Suppose you’re a web server, and you have some configuration like
server {
name 'julia.com';
... awesome perl site configuration ...
}
server {
name 'bork.com';
... awesome python site configuration ...
}
Then an HTTP request comes into the box! Perhaps like this
curl -v http://julia.com
> GET / HTTP/1.1
> Host: julia.com
> User-Agent: curl/7.47.0
> Accept: */*
So there’s this string in the request – “Host: julia.com”. This is what the web server uses to decide whether your request is for julia.com or bork.com or what. That’s it!
I like this because once you know what a HTTP request looks like – it becomes kind of obvious how a web server would decide where to send the request. There are only 4 things in it and only one of them is julia.com!
next level: SNI
Once last year someone told me they were working on setting up SNI at work. I was like – what’s that? I had literally no idea what those letters meant.
We used to live in a world where SSL was uncommon and difficult and a huge production and expensive. But now we have let’s encrypt and we want it to be easy and cheap and on by default. I’m going to use SSL and TLS interchangeably through this discussion.
We still want to put many secure websites on the same server. How do we do it? At first this seems easy – just use the Host header again, right? But!
- The contents of an HTTP request for julia.com (including the host header) are encrypted.
- With a certificate.
- Which certificate? The certificate for julia.com!
- How did I know to pick the certificate for julia.com?
So the problem is that you need to choose which certificate you’re going to use before you see the Host header. There are basically 2 ways to deal with this:
- Just put every site on your server on the same certificate (make a certificate for all of julia.com bork.com cookies.com awesome.com pandas.com)
- Use SNI. (“Server Name Indication”)
SNI is a standard where, when getting a website, you say “hey I’m gonna want julia.com” and the server is like “ok encryption time! I will use the cert for julia.com!” and then you say, secretly: “Host: julia.com. here’s more about that.”
More technically speaking the very first packet in a TLS negotiation is called the “Client Hello” packet. This packet has a hostname like julia.com in it! So you can get the hostname out of that packet and then continue the negotiation with the right certificate.
Paul Tagliamonte, who is great, wrote a parser that will let you extract the SNI hostname from a TLS packet! This is neat because you can see that it is only 150 lines of code or so.
a story about nginx & SNI
nginx is another web server! Let’s imagine you configure it this way. This is an experiment my awesome coworker Ray ran when we were trying to understand how nginx works.
# this is not exactly how nginx is configured but it's close
server {
listen 443 ssl;
server_name 'julia.com';
ssl_certificate julia.com;
return 200 "I'm julia.com";
}
server {
listen 443 ssl;
name 'bork.com';
ssl_certificate bork.com;
return 200 "I'm bork.com";
}
So, what happens if you set an SNI of julia.com, but then a Host header of bork.com? You can do this with curl https://julia.com -H "Host: bork.com"
Then you will get
- an SSL certificate for julia.com
- and it will say “I’m bork.com”
Really. Now this makes some sense – if I open up a TCP connection to the server, I might make a lot of different HTTP requests inside for different websites (like julia.com, bork.com, cookies.com), and it needs to deal with that somehow! It can’t change SSL certificates in the middle.
You might wonder “wait, is that secure to send requests for bork.com to julia.com?”. It’s okay! If you have a secure connection set up with a computer (you believe that computer is who you think it is), then it seems reasonable to send many requests to that computer (for instance for both images.google.com and google.com).
But it’s a little weird, right? I wanted to make extra sure that I understood how this worked, so I went and read some of the nginx source code. It turns out that there is a big file called ngx_http_request.c, and in it there is a function called ngx_http_find_virtual_server
, defined here.
In that file, you’ll see that ngx_http_find_virtual_server
is called twice per TLS request. Once to find the server block matching the SNI host (julia.com), to pick the right certificate. Then it’s called a second time when you get a Host header, to pick which server block to actually serve the response from..
So that’s how we get a response saying bork.com but a certificate for julia.com!
<3 fundamentals
Learning how HTTP works was an a+ move. It would be totally impossible for me to configure webservers at work if I didn’t know how it worked!