A small website mystery
Hello! For half of today, my website was broken! I like debugging stories, so I thought I’d tell this one. Someone tweeted at me this morning saying “hey your website has an issue”. They very kindly sent me a screenshot:
Yep. That looks like an issue to me! I asked them to run curl -i http://jvns.ca
and they sent me the output.
Let’s take a look at the HTTP headers.
HTTP/1.1 200 OK
Date: Wed, 10 May 2017 13:12:18 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d79dd5269ee9d191c6eb32a5ab5277a391494421938;
expires=Thu, 10-May-18 13:12:18 GMT; path=/; domain=.jvns.ca; HttpOnly
ETag: W/"3715-54e836f53861d-gzip"
Vary: Accept-Encoding
CF-Cache-Status: HIT
Expires: Thu, 11 May 2017 13:12:18 GMT
Cache-Control: public, max-age=86400
X-Content-Type-Options: nosniff
Server: cloudflare-nginx
CF-RAY: 35cd2639b2c36920-CDG
why are these HTTP headers wrong?
There are two things to notice here: first, it says ETag: W/"3715-54e836f53861d-gzip"
. This is a great clue. I was like “oh, is
it gibberish because it’s gzipped??”
How do you check if a file is gzipped? The easiest way is probably to
try to unzip it and see if it works. In this case the gzipped data was in the same file as the headers
though, so I ran hexdump -c file.txt
. I looked at the bytes at the
beginning of the binary data and it said 1f 8b
. I happen to know that those are the 2 bytes every gzip stream starts with!
So, it was gzipped. That’s fine though, browsers can handle gzipped
data! The second thing to notice is, well, something that isn’t there.
When a site sends gzipped data, it’s meant to send a Content-Encoding: gzip
header to say “hey, this content is gzipped, unzip it before
displaying it!” So we have our first mystery!
mystery 1: why is the Content-Encoding: gzip header missing?
What happened to the Content-Encoding: gzip header?
I tried running curl -I http://jvns.nfshost.com
(which is the backend
for my webhost, https://jvns.ca uses Cloudflare) to look at the HTTP headers. It was returning a
Content-Encoding: gzip
header! Here are the headers:
HTTP/1.1 200 OK
Date: Thu, 11 May 2017 01:51:56 GMT
Server: Apache
Upgrade: h2c
Connection: Upgrade
Last-Modified: Tue, 02 May 2017 05:01:38 GMT
ETag: "1fcdd-54e836f527c7c"
Accept-Ranges: bytes
Age: 118
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 14327
Content-Type: text/html; charset=UTF-8
This is also weird though! You might say – “okay, it says
Content-Encoding: gzip, that’s good”. But normally in order to get
gzipped content, you have to send an Accept-Encoding: gzip
header to
say “I understand gzip!”. But I wasn’t sending that header with curl, and my site
was returning gzipped content anyway. Weird, right?
So we haven’t solved our mystery, but we’ve found a SECOND mystery:
mystery 2: why does my site send gzipped content even when I didn’t ask it to??
the secret of the surprise gzipped content
I could think of an answer to the second mystery, though! A few years ago, I felt like I was spending too much money on bandwidth, and I wanted to save some money. I have a static site, so I gzipped every page on my site, and set up this Apache configuration:
RewriteEngine on
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [L]
This tells Apache “hey, always send gzipped replies no matter what!!”.
So we’ve solved Mystery 2 – I deleted that .htaccess
file, and
jvns.nfshost.com started behaving normally again.
Today my web host (nearlyfreespeech, which I like a lot) will automatically gzip content when asked to, but it didn’t in the past! (here’s the post announcing it)
Also, when I cleared my Cloudflare cache my site started behaving normally again, which I think means the problem is fixed. Maybe my weird Apache rule’s aberrant behavior was causing Cloudflare to break somehow? Not clear!
why did it take me half a day to fix it?
Normally if something is wrong with the Cloudflare version of my site but the non-CDN version of my site seems ok, I could just turn off Cloudflare for a bit to see if that fixes it. Hilariously, I turned on Strict-Transport-Security last week, which means my site only works if it’s served over HTTPS. And my normal webhost isn’t set up with HTTPS yet, so I can’t just turn off Cloudflare. That’s ok though, if a few pages on this blog are broken for a few hours the world won’t end.
What happened to the Content-Encoding: gzip header though?
I still don’t know where the Content-Encoding: gzip header went! Did Cloudflare remove it? Did my webhost stop serving it for some reason? I have no idea! Anyway, my site seems to work again (I think/hope?) and I thought this was kind of a fun excursion into HTTP headers.