Julia Evans

How is a binary executable organized? Let’s explore it!

I used to think that executables were totally impenetrable. I’d compile a C program, and then that was it! I had a Magical Binary Executable that I could no longer read.

It is not so! Executable file formats are regular file formats that you can understand. I’ll explain some simple tools to start! We’ll be working on Linux, with ELF binaries. (binaries are kind of the definition of platform-specific, so this is all platform-specific.) We’ll be using C, but you could just as easily look at output from any compiled language.

Let’s write a simple C program, hello.c:

1
2
3
4
5
#include <stdio.h>

int main() {
    printf("Penguin!\n");
}

Then we compile it (gcc -o hello hello.c), and we have a binary called hello. This originally seems impenetrable (how do we even binary?!), but let’s see how we can investigate it! We’re going to learn what symbols, sections, and segments are. At a high level:

  • symbols are like function names, and are used to answer “If I call printf and it’s defined somewhere else, how do I find it?”
  • symbols are organized into sections – code lives in one section (.text), and data in another (.data, .rodata)
  • sections are organized into segments

What happens if you write a TCP stack in Python?

in hackerschool, networking

During Hacker School, I wanted to understand networking better, and I decided to write a miniature TCP stack as part of that. I was much more comfortable with Python than C and I’d recently discovered the scapy networking library which made sending packets really easy.

So I started writing teeceepee!

The basic idea was

  1. open a raw network socket that lets me send TCP packets
  2. send a HTTP request to GET google.com
  3. get and parse a response
  4. celebrate!

I didn’t care much about proper error handling or anything; I just wanted to get one webpage and declare victory :)

Pair programming is amazing! Except… when it’s not.

in pairing

I wrote a blog post in March about why I find pair programming useful as a tool and why I enjoy it. There are entire companies like Pivotal that do pair programming 100% of the time, and they find it useful.

To get our terms straight, by “pair programming”, I mean “two people are trying to accomplish a task by sitting at a single computer together”.

Some people mentioned after I wrote that blog post that they disliked pair programming, sometimes strongly! Obviously these people aren’t wrong to not like it. So I asked people about their experiences:

People responded wonderfully. You can see about 160 thoughtful tweets about what people find hard or difficult in this Storify What do you find hard about pair programming?. I learned a ton, and my view that “pair programming is great and you totally should try it!!!” got tempered a little bit :)

Open sourced talks!

in talks

The wonderful Sumana Harihareshwara recently tweeted that she released her talk A few Python Tips as CC-BY. I thought this was a super cool idea!

After all, if you’ve put in a ton of work to put a talk or workshop together, it’s wonderful if other people can benefit from that as much as possible. And none of us have an unlimited amount of time to give talks.

Stephanie Sy, a developer in the Phillippines, emailed me recently to tell me that she used parts of my pandas cookbook to run a workshop. IN THE PHILIPPINES. How cool is that? She put her materials online, too!.

Ruby Rogues podcast: systems programming tricks!

If you listen to the Ruby Rogues podcast this week, you will find me! We talked about using systems programming tools (like strace) to debug your regular pedestrian code, building an operating system in Rust, but also other things I didn’t expect, like how asking stupid questions is an amazing way to learn.

Ruby Rogues also has a transcript of the entire episode, an index, and links to everything anyone referenced during the episode, including apparently 13 posts from this blog (!). I don’t even understand how this is possible, but apparently it is! It was a fun time, and apparently it is totally okay to spend a Ruby podcast discussing Rust, statistics, strace, and, well… not Ruby :)

Fun with stats: How big of a sample size do I need?

in statistics

[There’s a version of this post with calculations on nbviewer!]

I asked some people on Twitter what they wanted to understand about statistics, and someone asked:

“How do I decide how big of a sample size I need for an experiment?”

Flipping a coin

I’ll do my best to answer, but first let’s do an experiment! Let’s flip a coin ten times.

> flip_coin(10)
heads    7
tails    3

Oh man! 70% were heads! That’s a big difference.

NOPE. This was a random result! 10 as a sample size is way too small to decide that. What about 20?

!!Con talks are up

in bangbangcon

The talk recordings and transcripts for the amazing talks at !!Con have been posted! Go learn about EEG machines, how to stay in love with programming, type theory, dancing robots, hacking poetry, and more!

Here they are!!

Erty Seidel did pretty much 100% of the work for the talk recordings. Super pleased with the results.

Machine learning isn’t Kaggle competitions

in machinelearning

I write about strace and kernel programming on this blog, but at work I actually mostly work on machine learning, and it’s about time I started writing about it! Disclaimer: I work on a data analysis / engineering team at a tech company, so that’s where I’m coming from.

When I started trying to get better at machine learning, I went to Kaggle (a site where you compete to solve machine learning problems) and tried out one of the classification problems. I used an out-of-the-box algorithm, messed around a bit, and definitely did not make the leaderboard. I felt sad and demoralized – what if I was really bad at this and never got to do math at work?! I still don’t think I could win a Kaggle competition. But I have a job where I do (among other things) machine learning! What gives?

To back up from Kaggle for a second, let’s imagine that you have an awesome startup idea. You’re going to predict flight arrival times for people! There are a ton of decisions you’ll need to make before you even start thinking about support vector machines:

Asking questions is a superpower

There are all kinds of things that I think I “should” know and don’t. A few things that I don’t understand as well as I’d like to:

  • Database replication and sharding (seriously how does replication even work)
  • How fast a computer can process data (should I expect more or less than 6GB/s if it’s a simple CPU-bound program where the data is already in RAM?)
  • How do system calls work, reeeeally? (I do not understand context switching nearly as well as I could!)
  • An truly embarrassing amount of basic statistics, even though I have a math degree.

There are lots of much more embarrassing things that I just can’t think of right now.

I’ve started trying to ask questions any time I don’t understand something, instead of worrying about whether people will think I’m dumb for not knowing it. This is magical, because it means I can then learn those things!