Day 42: Writing a Go program to manage Firecracker VMs
Hello! On Tuesday I spent more time working on figuring out how to run VMs with Firecracker for my SSH game project. They still start super fast and I’m really excited about them.
I got through 3 main things:
- learned 1 new thing about how linux bridges work
- figured out how to make my Ubuntu VMs boot fast
- wrote a small Go server to manage Firecracker VMs
a linux bridge isn’t just a bridge
Every single time I say I’m confused about bridges someone will tell me “well, you see julia, a bridge is like a virtual switch”. This has never made any sense to me because I’ve never used a switch either, and also I felt like there was just something off about that explanation and that it didn’t explain the behavior I was seeing in a way I couldn’t articulate.
I think I finally learned something concrete about why bridges are confusing
though! On Linux, when you create a bridge you get an network interface (like
docker0
for the Docker bridge). And that network interface has an ip address,
and you can use that network interface/ip address as a gateway for
containers/VMs you’re running.
But switches don’t have IP addresses! So if “a bridge is like a switch”, what’s going on? A bridge doesn’t really seems like it’s a switch! This analogy really seems to be breaking down. Someone on Twitter finally explained yesterday to me that when you create a bridge on Linux by default, you actually get 2 things:
- a bridge (the kind that’s “like a switch”, that doesn’t have an ip address and just forwards packets blindly)
- a network interface with the same name as the bridge, which has an IP address that you can use as a gateway.
They said that if you want, you can delete the network interface part of the bridge. I still haven’t experimented enough to work this out but I feel really good about this piece of information and like I can use it to properly understand what a Linux bridge is later.
Also I feel kind of vindicated in disbelieving this “a bridge is like a switch” explanation because I guess it’s technically true but it’s definitely missing a key piece of information for Linux bridges.
fixed my Ubuntu VMs taking 2 minutes to boot
My Ubuntu Firecracker VMs had been taking 2-3 minutes to boot. They were hanging on a systemd step called “Load/Save Random Seed”, which apparently has something to do with kernel entropy.
I googled this and tried a lot of different things to fix it. Here are all the things that I tried that did not work:
- add random.trust_cpu=on to kernel boot args
- set SYSTEMD_RANDOM_SEED_CREDIT=true in systemd-random-seed.service
- set SYSTEMD_RANDOM_SEED_CREDIT=force in systemd-random-seed.service
- install & enable haveged
- install rng-tools
- systemctl disable systemd-random-seed (though this really SHOULD have worked, I think I did something wrong there)
Finally I changed the timeout in the systemd-random-service file to 2 seconds,
which worked! Now my VMs start fast. It’s extremely possible that I actually
need this entropy generation for some reason (maybe to give sshd
enough
entropy so that it can generate session keys securely?) but I’ll cross that
bridge when I come to it.
So now I can start an Ubuntu virtual machine in like 5 seconds! It’s really amazing. It’s probably possible to bring the boot time a bit more but I’m happy with that.
wrote a Go program to manage Firecracker VMs
So far I’ve been starting VMs with the DigitalOcean API. So I wanted to write my own little API to create Firecracker VMs. It was pretty straightforward because I mostly just copied a bunch of code from this Firecracker command line tool called firectl: https://github.com/firecracker-microvm/firectl
Here’s a gist with my (pretty messy) code so far: firecracker-manager.go.
what my API looks like so far
It totally works! I can start a VM with:
echo '{
"root_image_path": "/images/ubuntu.ext4",
"kernel_path": "/images/vmlinux"
}' | http post http://localhost:8080/create
and I can stop it with:
echo '{"id": "DE52E8A0-C624-18CB-F948-0B50C77C8F4A"}' | http post localhost:8080/delete
It’s still missing some things, like:
- I should probably use the firecracker jailer for better security (like firectl does)
- right now I’m still writing the VM’s serial output to stdout
- I might make it REST-y and use a DELETE request to stop a VM
and probably lots more things I’m not thinking of right now