Day 9: Bytecode is made of bytes! CPython isn't scary!

Today I paired with one of the fantastic Hacker School facilitators, Allison on fixing some bugs in a bytecode interpreter. byterun is a pure python interpreter for the bytecode that CPython generates, written for learning & fun times.

Allison has a great blog post about how to use the dis module to look at the bytecode for a function which you should totally read.

A few things I learned

The CPython interpreter is mostly in one 3,500 file called ceval.c (see it on github!). The main part of this file is a 2,000-line switch statement – switch(opcode) {.... Ack.

But! This file is surprisingly not-scary. Or Allison is just amazing at making things seem not scary. So for example there’s a BINARY_SUBTRACT opcode which, well, subtracts things.

Here’s the actual for serious C code that handles this:

TARGET(BINARY_SUBTRACT) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *diff = PyNumber_Subtract(left, right);
    Py_DECREF(right);
    Py_DECREF(left);
    SET_TOP(diff);
    if (diff == NULL)
        goto error;
    DISPATCH();
}

{:lang=‘c’}

So, what does this do?

Get the arguments off the stack
Subtract them by looking up left.__sub__(right)
Decrease the number of references to left and right for garbage collection reasons
Put the result on the stack
If __add__ doesn’t return anything, throw an exception
DISPATCH(), which basically just means “go to the next instruction”

I could TOTALLY WRITE THAT.

We spent some time reading the C code that deals with exception handling in Python. It was pretty confusing, but I learned that you can do raise ValueError from Exception to set the cause of an exception.

Basically the lesson here is

Allison is the best. Pairing with her on byterun is the most fun thing
It’s actually possible to read the C code that runs Python!
Bytecode is made of bytes. Like, there are less than 256 instructions and each one is a byte. I did not realize this until today. Laugh all you want =D