Julia Evans

Hacker School alumna

If you like this, you may like Ulia Ea.

What happens if you write a TCP stack in Python?

During Hacker School, I wanted to understand networking better, and I decided to write a miniature TCP stack as part of that. I was much more comfortable with Python than C and I’d recently discovered the scapy networking library which made sending packets really easy.

So I started writing teeceepee!

The basic idea was

  1. open a raw network socket that lets me send TCP packets
  2. send a HTTP request to GET google.com
  3. get and parse a response
  4. celebrate!

I didn’t care much about proper error handling or anything; I just wanted to get one webpage and declare victory :)

Step 1: the TCP handshake

I started out by doing a TCP handshake with Google! (this won’t necessarily run correctly, but illustrates the principles). I’ve commented each line.

The way a TCP handshake works is:

  • me: SYN
  • google: SYNACK!
  • me: ACK!!!

Pretty simple, right? Let’s put it in code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# My local network IP
src_ip = "192.168.0.11"
# Google's IP
dest_ip = "96.127.250.29"
# IP header: this is coming from me, and going to Google
ip_header = IP(dst=dest_ip, src=src_ip)
# Specify a large random port number for myself (59333),
# and port 80 for Google The "S" flag means this is
# a SYN packet
syn = TCP(dport=80, sport=59333,
          ack=0, flags="S")
# Send the SYN packet to Google
# scapy uses '/' to combine packets with headers
response = srp(ip_header / syn)
# Add the sequence number 
ack = TCP(dport=80, sport=self.src_port,
          ack=response.seq, flags="A")
# Reply with the ACK
srp(ip_header / ack)

Wait, sequence numbers?

What’s all this about sequence numbers? The whole point of TCP is to make sure you can resend packets if some of them go missing. Sequence numbers are a way to check if you’ve missed packets. So let’s say that Google sends me 4 packets, size 110, 120, 200, and 500 bytes. Let’s pretend the initial sequence number is 0. Then those packets will have sequence numbers 0, 110, 230, and 430.

So if I suddenly got a 100-byte packet with a sequence number of 2000, that would mean I missed a packet! The next sequence number should be 930!

How can Google know that I missed the packet? Every time I receive a packet from Google, I need to send an ACK (“I got the packet with sequence number 230, thanks!”). If the Google server notices I haven’t ACKed a packet, then it can resend it!

The TCP protocol is extremely complicated and has all kinds of rate limiting logic in it, but we’re not going to talk about any of that. This is all you’ll need to know about TCP for this post!

For a more in-depth explanation, including how SYN packets affect sequence numbers, I found Understanding TCP sequence numbers very clear.

Step 2: OH NO I already have a TCP stack

So I ran the code above, and I had a problem. IT DIDN’T WORK.

But in a kind of funny way! I just didn’t get any responses. I looked in Wireshark (a wonderful tool for spying on your packets) and it looked like this:

1
2
3
me: SYN
google: SYNACK
me: RST

Wait, what? I never sent a RST packet?! RST means STOP THE CONNECTION IT’S OVER. That is not in my code at all!

This is when I remembered that I already have a TCP stack on my computer, in my kernel. So what was actually happening was:

1
2
3
4
my Python program: SYN
google: SYNACK
my kernel: lol wtf I never asked for this! RST!
my Python program: ... :(

So how do we bypass the kernel? I talked to the delightful Jari Takkala about this, and he suggested using ARP spoofing to pretend I had a different IP address (like 192.168.0.129).

The new exchange was like this:

1
2
3
4
5
6
me: hey router! send packets for 192.168.0.129 to my MAC address
router: (does it silently)
my Python program: SYN (from 192.168.0.129)
google: SYNACK
kernel: this isn't my IP address! <ignore>
my Python program: ACK YAY

And it worked! Okay, awesome, we can now send packets AND GET RESPONSES without my kernel interfering! AWESOME.

Step 3: get a webpage!

There is an intervening step here where I fix tons of irritating bugs preventing Google from sending me the HTML for http://google.com. I eventually fixed all of these, and emerge victorious!

I needed to

  • put together a packet containing a HTTP GET request
  • make sure I can listen for lots of packets in response, not just one
  • spend a lot of time fixing bugs with sequence numbers
  • try to close the connection properly

Step 4: realize Python is slow

Once I had everything working, I used Wireshark again to look at what packets were being sent back and forth. It looked something like this:

1
2
3
4
5
6
7
me/google: <tcp handshake>
me: GET google.com
google: 100 packets
me: 3 ACKs
google: <starts resending packets>
me: a few more ACKs
google: <reset connection>

The sequences of packets from Google and ACKs from me looked something like: P P P A P P P P P A P P A P P P P A. Google was sending me packets way faster than my program could keep up and send ACKs. Then, hilariously, Google’s server would assume that there were network problems causing me to not ACK its packets.

And it would eventually reset the connection because it would decide there were connection problems.

But the connection was fine! My program was totally responding! It was just that my Python program was way too slow to respond to packets in the millisecond times it expected.

(edit: this diagnosis seems to be incorrect :) you can read some discussion about what may be actually going on here)

life lessons

If you’re actually writing a production TCP stack, don’t use Python. (surprise!)

I was really happy that it actually worked, though! The ARP spoofing was extremely finicky, but I wrote a version of curl using it which worked about 25% of the time. You can see all the absurd code at https://github.com/jvns/teeceepee/.

I think this was actually way more fun and instructive than trying to write a TCP stack in an appropriate language like C :)

Pair programming is amazing! Except… when it’s not.

I wrote a blog post in March about why I find pair programming useful as a tool and why I enjoy it. There are entire companies like Pivotal that do pair programming 100% of the time, and they find it useful.

To get our terms straight, by “pair programming”, I mean “two people are trying to accomplish a task by sitting at a single computer together”.

Some people mentioned after I wrote that blog post that they disliked pair programming, sometimes strongly! Obviously these people aren’t wrong to not like it. So I asked people about their experiences:

People responded wonderfully. You can see about 160 thoughtful tweets about what people find hard or difficult in this Storify What do you find hard about pair programming?. I learned a ton, and my view that “pair programming is great and you totally should try it!!!” got tempered a little bit :)

If you’re not up to reading all that, here are the broad categories that the difficulties fell into. Thanks very much to everyone who responded for giving permission for me to post their comments!

“I’m completely drained after an hour or two”

Pair programming is really intense. You concentrate really hard, don’t take a lot of breaks, and it’s very mentally taxing. Tons of people brought this up. And this seems to be true for everyone, even people who find it a useful tool.

  • “it can be very stressful and draining for an introvert, both productivity killers in the long run.” - @hoxworth
  • “I used to work at Pivotal (100% pairing). IME pairing makes everything go faster. Also exhausting.” - @shifrapr
  • “definitely would not like my entire project to be pair programmed though; even 2-3 days would be exhausting.” - @lojikil
  • “Downsides I hear a lot when teaching workshops on pairing: exhausting” - @moss
  • “I find it sometimes awesome & sometimes really frustrating, honestly. It can be exhausting,but also a way to discover unknown unknowns” - @DanielleSucher
  • “that being sad: pairing is great. All the time though would be exhausting (for me)” - @qrush
  • “It is hard sometimes because you need to be on the same wavelength as another person which can be tiring.” - @zmanji

“I can’t type when there’s somebody looking. I hate pairing.”

Anxiety around pairing is really common. Some people say that they found it easier as time went on. Some people also didn’t! It can be good to encourage someone to try something, but if someone’s tried and it just makes them super-anxious, respect that!

  • “I hate pairing because I can’t type when there’s somebody looking and I get anxious when I watch somebody else typing for long D:” - @seaandsailor
  • “I type somewhat slow and I always feel pressure (real or imagined) from the other person.” - @Torwegia
  • ” I have seen seasoned vim users writhe in pain upon having to watch a normal user type at a typically glacial human speed :)” - @brandon_rhodes
  • “I suffer keyboard anxiety when I haven’t paired in a while.” - @meangrape
  • “anxiety, fear of being judged” - @qrush
  • “i get self-conscious, make dumb mistakes, confuse myself.. :( pairing is the worst” - @wirehead2501
  • “it’s something about having someone see my process, like when you’re writing an email with someone reading over your shoulder.” - @wirehead2501

“I only like pairing when my partner is a pleasure to work with”

This is pretty key. Pairing is a pretty intimate thing to do – you’re letting people see exactly how you work. If you don’t trust and respect the person that you’re pairing with, it doesn’t work. There also seems to be some mystical magical pairing juice where with some people it just doesn’t work, and with some people it’s amazing.

  • ” once you’re pairing with an asshole, you might as well stop. There’s no point.” - @hsjuju2
  • “I only like pairing when my partner is a pleasure to work with. So I try to be too.” - @rkulla
  • “if you feel like someone will see you as less competent for voicing your thoughts, I’d rather code by myself” - @hsjuju2
  • “I think the social rules of [Hacker School] make pairing a lot more helpful and fun.” - @hsjuju2
  • “yeah it really has to be a safe space. Done among people who trust and respect one another. It also builds trust and respect.” - @gigachurch

“Talking through something doesn’t help me think”

A lot of the reason that I like pairing is that talking helps me work through problems. People are different! Some people hate talking about things to think. Something to be aware of.

  • “personally I only make progress on problems when talking to someone.” - @cartazio
  • “I am not someone who thinks out loud, and i feel like that’s one reason pairing is hard for me.” - @wirehead2501
  • “like, not only do i not understand by talking, but trying to talk through something before i think = more confused” - @wirehead2501
  • “I’m someone who thinks out loud, and understands by talking, whereas some people take that as bad” - @hsjuju2

This is also relevant to interviewing: advice like “try to talk through your issue!” works really well for some people, and badly for others.

“It’s bad when one person dominates”

My first pairing experience (years ago) was with someone who was a much better programmer than me, and basically bulldozed through the problem and left me no room to contribute. This really undermined my confidence and was awful.

When pairing with people with significantly less experience than me, I try to be really careful about this. One good trick that I learned from Zach Allaun at Hacker School is to always pair on the less experienced person’s project and/or let the newer person drive. If you’re working on their project then they’re at least the expert on how their project works, which helps a lot.

“I love pair debugging, not pair programming”

Variations on this were pretty common. A few people said that they like working together, but not for producing code. It’s totally okay to use pairing in specific ways (for teaching or for debugging or for debugging), and not for other things.

  • “+1 for loving code reviews, pair programming, not do much. Pair debugging on the other hand can be excellent.” - @pphaneuf
  • “i actually find it really useful as a “let’s get to know how each other’s brain works” & a shortcut for coming up to speed on a codebase or a new language. otherwise–i haven’t had really awesome experiences with it.” - @zmagg
  • “I’m not sold on always pairing, but being able to debug or design w/ a second pair of eyes is often useful, & it helps share skills.” - @silentbicycle
  • “Can be a good way to learn. I was pretty much taught perl via pair programming years ago by a very patient coworker.” - @wendyck
  • “I spend half my day staring into space letting solutions pop into my head. Hard to do that with a partner there.” - @aconbere

Pair programming is amazing… sometimes

Pair programming can be a super useful tool. If you understand why people (such as yourself, maybe!) might find it hard or stressful, you can have more productive pairing sessions, and decide when pair programming is a good way to get a task done!

Open sourced talks!

The wonderful Sumana Harihareshwara recently tweeted that she released her talk A few Python Tips as CC-BY. I thought this was a super cool idea!

After all, if you’ve put in a ton of work to put a talk or workshop together, it’s wonderful if other people can benefit from that as much as possible. And none of us have an unlimited amount of time to give talks.

Stephanie Sy, a developer in the Phillippines, emailed me recently to tell me that she used parts of my pandas cookbook to run a workshop. IN THE PHILIPPINES. How cool is that? She put her materials online, too!.

So if you want to give a talk about how to do data analysis with Python, you too can reuse these materials in any way you see fit! You can get materials for talks I’ve given on this page of talks. Just attribute me, and maybe tell me about it because THAT WOULD BE COOL :)

In other open source talks news, Software Carpentry also has MIT-licensed lesson materials! Want to give an novice introduction to git? Go to the SWC bootcamp respository and look in novice/git! They even take pull requests.

Ruby Rogues podcast: systems programming tricks!

If you listen to the Ruby Rogues podcast this week, you will find me! We talked about using systems programming tools (like strace) to debug your regular pedestrian code, building an operating system in Rust, but also other things I didn’t expect, like how asking stupid questions is an amazing way to learn.

Ruby Rogues also has a transcript of the entire episode, an index, and links to everything anyone referenced during the episode, including apparently 13 posts from this blog (!). I don’t even understand how this is possible, but apparently it is! It was a fun time, and apparently it is totally okay to spend a Ruby podcast discussing Rust, statistics, strace, and, well… not Ruby :)

Fun with stats: How big of a sample size do I need?

[There’s a version of this post with calculations on nbviewer!]

I asked some people on Twitter what they wanted to understand about statistics, and someone asked:

“How do I decide how big of a sample size I need for an experiment?”

Flipping a coin

I’ll do my best to answer, but first let’s do an experiment! Let’s flip a coin ten times.

> flip_coin(10)
heads    7
tails    3

Oh man! 70% were heads! That’s a big difference.

NOPE. This was a random result! 10 as a sample size is way too small to decide that. What about 20?

> flip_coin(20)
heads    13
tails     7

65% were heads! That is still a pretty big difference! NOPE. What about 10000?

> flip_coin(10000)
heads    5018
tails    4982

That’s very close to 50%.

So what we’ve learned already, without even doing any statistics, is that if you’re doing an experiment with two possible outcomes, and you’re doing 10 trials, that’s terrible. If you do 10,000 trials, that’s pretty good, and if you see a big difference, like 80% / 20%, you can almost certainly rely on it.

But if you’re trying to detect a small difference like 50.3% / 49.7%, that’s not a big enough difference to detect with only 10,000 trials.

So far this has all been totally handwavy. There are a couple of ways to formalize our claims about sample size. One really common way is by doing hypothesis testing. So let’s do that!

Let’s imagine that our experiment is that we’re asking people whether they like mustard or not. We need to make a decision now about our experiment.

Step 1: make a null hypothesis

Let’s say that we’ve talked to 10 people, and 7/10 of them like mustard. We are not fooled by small sample sizes and we ALREADY KNOW that we can’t trust this information. But your brother is arguing “7/10 seems like a lot! I like mustard! I totally believe this!”. You need to argue with him with MATH.

So we’re going to make what’s called a “null hypothesis”, and try to disprove it. In this case, let’s make the null hypothesis “there’s a 50/50 chance that a given person likes mustard”.

So! What’s the probability of seeing an outcome like 7/10 if the null hypothesis is true? We could calculate this, but we have a computer and I think it’s more fun to use the computer.

So let’s pretend we ran this experiment 10,000 times, and the null hypothesis was true. We’d expect to sometimes get 10/10 mustard likers, sometimes 0/10, but mostly something in between. Since we can program, let’s run the asking-10-people experiment 10,000 times!

I programmed it, and here are the results:

0        7
1      102
2      444
3     1158
4     2002
5     2425
6     2094
7     1176
8      454
9      127
10      11

Or, on a pretty graph:

Okay, amazing. The next step is:

Step 2: Find out the probability of seeing an outcome this unlikely or more if the null hypothesis is true

The “this unlikely or more” part is key: we don’t want to know the probability of seeing exactly 7/10 mustard-likers, we want to know the probability of seeing 7/10 or 8/10 or 9/10 or 10/10.

So if we add up all the times when 7/10 or more people liked mustard by looking at our table, that’s about 1700 times, or 17% of the time.

We could also calculate the exact probabilities, but this is pretty close so we won’t. The way this kind of hypothesis testing works is that you only reject the null hypothesis if the probability of seeing this data if it’s true is really low. So here the probability of seeing this data if the null hypothesis is true is 17%. 17% is pretty high, (1/6!), so we won’t reject it. This value (0.17) is called a p-value by statisticians. We won’t say that word again here though. Usually you want this to be more like 1% or 5%.

We’ve really quickly arrived at

Step 3: Decide whether or not to reject the null hypothesis

If we see that 7/10 people like mustard, we can’t reject it! If we’d instead seen that 10/10 of our survey respondants liked mustard, that would be a totally different story! The probability of seeing that is only about 10/10000, or 0.1%. So it would be actually very reasonable to reject the null hypothesis.

What if we’d used a bigger sample size?

So asking 10 people wasn’t good enough. What if we asked 10,000 people? Well, we have a computer, so we can simulate that!

Let’s flip a coin 10,000 times and count the number of heads. We’ll get a number (like 5,001). Then we’ll repeat that experiment 10,000 times and graph the results. This is like running 10,000 surveys of 10,000 people each.

That’s pretty narrow, so let’s zoom in to see better.

So in this graph we ran 10,000 surveys of 10,000 people, and in about 100 of them 5000 people said they liked mustard

There are two neat things about this graph. The first neat thing is that it looks like a normal distribution, or “bell curve”. That’s not a coincidence! It’s because of the central limit theorem! MATH IS AMAZING.

The second is how tightly centred it is around 5,000. You can see that the probability of seeing more than 52% or less than 48% is really low. This is because we’ve done a lot of samples.

This also helps us understand how people could have calculated these probabilities back when we did not have computers but still needed to do statistics – if you know that your distribution is going to be approximately the normal distribution (because of the central limit theorem), you can use normal distribution tables to do your calculations.

In this case, “the number of heads you get when flipping a coin 10,000 times” is approximately normally distributed, with mean 5000.

So how big of a sample size do I need?

Here’s a way to think about it:

  1. Pick a null hypothesis (people are equally likely to like mustard or not)
  2. Pick a sample size (10000)
  3. Pick a test (do at least 5200 people say they like mustard?)
  4. What would the probability of your test passing be if the null hypothesis was true? (less than 1%!)
  5. If that probability is low, it means that you can reject your null hypothesis! And your less-mathematically-savvy brother is wrong, and you have PROOF.

Some things that we didn’t discuss here, but could have:

  • independence (we’re implicitly assuming all the samples are independent)
  • trying to prove an alternate hypothesis as well as trying to disprove the null hypothesis

I was also going to do a Bayesian analysis of this same data but I’m going to go biking instead. That will have to wait for another day. Later!

(Thanks very much to the fantastic Alyssa Frazee for proofreading this and fixing my terrible stats mistakes. And Kamal for making it much more understandable. Any remaining mistakes are mine.)

How I did Hacker School: ignoring things I understand and doing the impossible

Hacker School is a 12 week workshop where you work on becoming a better programmer. But when you have 12 weeks of uninterrupted time to spend on whatever you want, what do you actually do? I wrote down what I worked on every day of Hacker School, but I always have trouble articulating advice about what to work on. So this isn’t advice, it’s what I did.

One huge part of the way I ended up approaching Hacker School was to ignore a ton of stuff that goes on there. For example! I find all these things kind of interesting:

  • machine learning
  • web development
  • hardware projects
  • games
  • new programming languages

But I’d been working as a web developer / in machine learning for a couple of years, and I wasn’t scared by them. I don’t feel right now like learning more programming languages is going to make me a better programmer.

And there were tons of interesting-sounding workshops where Mary would live code a space invaders game in Javascript (!!!), or Zach would give an intermediate Clojure workshop, or people would work together on a fun hardware project. People were building neural networks, which looked fun!

I mostly did not go to these workshops. It turned out that I was interested in all those things, but more interested in learning:

I wanted to work on things that seemed impossible to me, and writing an operating system seemed impossible. I didn’t know anything about operating systems. This was amazing.

This meant sometimes saying no to requests to pair on things that weren’t on my roadmap, even if they seemed super interesting! I also learned that if I wanted something to exist, I could just make it.

I ran a kernel development workshop for a while in my first two weeks. Jari and Pierre and Brian came, and they answered “what is a kernel? what are its responsibilities?”. This was hugely helpful to me, and I learned a ton of the basics of kernel programming. Nobody I talked to had built an operating system from scratch, so I learned how! Filippo answered a lot of my security questions and helped when I was confused about assembly. Daphne was working on a shell and I paired with her and learned a ton.

People at Hacker School know an amazing amount of stuff. There is so much to learn from them.

So I don’t have advice, but for me one some the most important things to remember about Hacker School were that other people have different interests than me, and that’s okay, and I can make Hacker School what I want it to be.

!!Con talks are up

The talk recordings and transcripts for the amazing talks at !!Con have been posted! Go learn about EEG machines, how to stay in love with programming, type theory, dancing robots, hacking poetry, and more!

Here they are!!

Erty Seidel did pretty much 100% of the work for the talk recordings. Super pleased with the results.

Machine learning isn’t Kaggle competitions

I write about strace and kernel programming on this blog, but at work I actually mostly work on machine learning, and it’s about time I started writing about it! Disclaimer: I work on a data analysis / engineering team at a tech company, so that’s where I’m coming from.

When I started trying to get better at machine learning, I went to Kaggle (a site where you compete to solve machine learning problems) and tried out one of the classification problems. I used an out-of-the-box algorithm, messed around a bit, and definitely did not make the leaderboard. I felt sad and demoralized – what if I was really bad at this and never got to do math at work?! I still don’t think I could win a Kaggle competition. But I have a job where I do (among other things) machine learning! What gives?

To back up from Kaggle for a second, let’s imagine that you have an awesome startup idea. You’re going to predict flight arrival times for people! There are a ton of decisions you’ll need to make before you even start thinking about support vector machines:

Understand the business problem

If you want to predict flight arrival times, what are you really trying to do? Some possible options:

  • Help the airline understand which flights are likely to be delayed, so they can fix it.
  • Help people buy flights that are less likely to be delayed.
  • Warn people if their flight tomorrow is going to be delayed

I’ve spent time on projects where I didn’t understand at all how the model was going to fit into business plans. If this is you, it doesn’t matter how good your model is. At all.

Understanding the business problem will also help you decide:

  • How accurate does my model really need to be? What kind of false positive rate is acceptable?
  • What data can I use? If you’re predicting flight days tomorrow, you can look at weather data, but if someone is buying a flight a month from now then you’ll have no clue.

Choose a metric to optimize

Let’s take our flight delays example. We first have to decide whether to do classification (“will this flight be delayed for at least an hour”) or regression (“how long will this flight be delayed for?”). Let’s say we pick regression.

People often optimize the sum of squares because it has nice statistical properties. But mispredicting a flight arrival time by 10 hours and by 20 hours are pretty much equally bad. Is the sum of squares really appropriate here?

Decide what data to use

Let’s say I already have the airline, the flight number, departure airport, plane model, and the departure and arrival times.

Should I try to buy more specific information about the different plane models (age, what parts are in them..)? Really accurate weather data? The amount of information available to you isn’t fixed! You can get more!

Clean up your data

Once you have data, your data will be a mess. In this flight search example, there will likely be

  • airports that are inconsistently named
  • missing delay information all over the place
  • weird date formats
  • trouble reconciling weather data and airport location

Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.

Build a model!

This is the fun Kaggle part. Training! Cross-validation! Yay!

Now that we’ve built what we think is a great model, we actually have to use it:

Put your model into production

Netflix didn’t actually implement the model that won the Netflix competition because it was too complicated.

If you trained your model in Python, can you run it in production in Python? How fast does it need to be able to return results? Are you running a model that bids on advertising spots / does high frequency trading?

If we’re predicting flight delays, it’s probably okay for our model to run somewhat slowly.

Another surprisingly difficult thing is gathering the data to evaluate your model – getting historical weather data is one thing, but getting that same data in real time to predict flight delays right now is totally different.

Measure your model’s performance

Now that we’re running the model on live data, how do I measure its real-life performance? Where do I log the scores it’s producing? If there’s a huge change in the inputs my model is getting after 6 months, how will I find out?

Kaggle solves all of this for you.

With Kaggle, almost all of these problems are already solved for you: you don’t need to worry about the engineering aspects of running a model on live data, the underlying business problem, choosing a metric, or collecting and cleaning up data.

You won’t go through all these steps just once – maybe you’ll build a model and it won’t perform well so you’ll try to add some additional features and see if you can build a better model. Or maybe how useful the model is to your business depends on how good the results are.

Doing Kaggle problems is fun! It means you can focus on machine learning algorithm nerdery and get better at that. But it’s pretty far removed from my job, where I work on a team (hiring!) that thinks about all of these problems. Right now I’m looking at measuring models’ performance once they’re in production, for instance!

So if you look at Kaggle leaderboards and think that you’re bad at machine learning because you’re not doing well, don’t. It’s a fun but artificial problem that doesn’t reflect real machine learning work.

(to be clear: I don’t think that Kaggle misrepresents itself, or does a bad job – it specializes in a particular thing and that’s fine. But when I was starting out, I thought that machine learning work would be like Kaggle competitions, and it’s not.)

(thanks to the fantastic Alyssa Frazee for helping with drafts of this!)

Asking questions is a superpower

There are all kinds of things that I think I “should” know and don’t. A few things that I don’t understand as well as I’d like to:

  • Database replication and sharding (seriously how does replication even work)
  • How fast a computer can process data (should I expect more or less than 6GB/s if it’s a simple CPU-bound program where the data is already in RAM?)
  • How do system calls work, reeeeally? (I do not understand context switching nearly as well as I could!)
  • An truly embarrassing amount of basic statistics, even though I have a math degree.

There are lots of much more embarrassing things that I just can’t think of right now.

I’ve started trying to ask questions any time I don’t understand something, instead of worrying about whether people will think I’m dumb for not knowing it. This is magical, because it means I can then learn those things!

One of my very favorite examples of this is how I started learning about operating systems. At the beginning of Hacker School, I realized that I legitimately did not know what a kernel was or did more than “er, operating system stuff”.

This was super embarrassing! I’d been using Linux for 10 years, and I didn’t really understand at all what the basic responsibilities of the Linux kernel were. Oh no! Instead of hiding under a rock, I asked. And then people told me, and I wrote What does the Linux kernel even do?.

I don’t know how I would have learned without asking. Now I have given talks about getting started with understanding the Linux kernel! So fun!

One surprising thing about asking questions is that when I start digging into a problem, people who I respect and who know a lot will sometimes not know the answers at all! For instance, I’ll think that someone totally knows about the Linux kernel, but of course they don’t know everything, and if I’m trying to do something specific like write a rootkit they might not know all the details of how to do it.

aphyr is a really good example of someone who asks basic questions and gets unexpected answers. He does research into whether distributed systems are reliable (linearizable? consistent? / available?). The results he finds are like RabbitMQ might lose 40% of your data. Ooooops. If you don’t start asking questions about how RabbitMQ works from the beginning (in his case, by writing a program called Jepsen that automates this kind of reliability testing), then you’ll never find that out. (be skeptical! Don’t believe what people say even if they’re using fancy words!)

“I don’t understand.”

Another hard thing is admitting that I don’t understand. I try to not be too judgemental about this – if someone is explaining something to me and it doesn’t make sense, it’s possible that they’re explaining it badly! Or that I’m tired! Or any other number of reasons. But if I don’t tell them I’m don’t understand, I’m never going to understand the damn thing.

So I try to take a deep breath and say cheerfully “Nope!”, figure exactly which aspect of the thing I don’t understand, and ask a clarifying question.

As a sideeffect, I’ve acquired much less patience and respect for people who give talks which sound really smart but are difficult to understand, and somewhat more willingness to ask questions like “so what IS <basic concept that you did not explain>?”.

Avoiding mansplaining

A difficult thing about asking questions is that I have to be pretty careful about asking the right questions and making it clear which parts I know already. This is just good hygiene, and makes sure nobody’s time gets wasted.

For instance, I have sometimes said things like “I don’t know anything about statistics”, which is actually false and sometimes results in people trying to explain basic probability theory to me, or what an estimator is, or maybe the difference between a biased and unbiased estimator. It turns out these are actually things I know! So I need to be more specific, like “can we walk through some basic survival analysis?” (actually a thing I would like to understand!)

HUGE SUCCESS

So! Understanding and learning are more important than feeling smart. Probably the most important thing I learned at Hacker School was how to ask questions and admit when I don’t understand something. I know way more things now as a result! (see: this entire blog of things I have learned)

Working remote, 3 months in

I’ve been working remotely for Stripe for 3 months now.

I decided to do this because I interviewed at this place, and the people were thoughtful and friendly and interesting and knew things that I did not know! But they were all in San Francisco, and I didn’t want to move there at all. They convinced me that if I worked remote it might not be a disaster.

I was still pretty scared about working remote, though! So far it’s been hard, but I’m learning how to do it better. I’m somewhat extroverted, so it’s possible for me to go a bit stir-crazy sitting alone by myself all day.

I live on the east coast. The people I work with are mostly in San Francisco, three timezones away. So when I start work it’s usually around 6am in SF.

Let’s start with some things I have trouble with:

Hard things

  • Timezones are hard. If I start working at 8, there aren’t many people I can talk to BECAUSE IT’S 5AM. (however: it’s a really good time to focus! And I can be a wizard and finish tasks before everyone wakes up in the morning!)
  • I don’t know how to meet new people without visiting the physical office. A lot of people are just names on IRC to me. I do not know of any upside to this, or how to fix it.
  • I’m worried about the winter.
  • I didn’t realize how much I depended on synchronous communication (talking face-to-face!) to do things until it was taken away from me. This is thankfully getting easier.
  • It seems pretty difficult for me to know very much about the office culture.
  • I find building consensus about technical decisions hard to do remotely. (see: depending on synchronous communication)
  • A/V is hard. I often don’t try to participate in talks because I don’t expect the experience to be good.

Good things:

  • I get to work with people who I like and live where I want to live. And I’m learning a lot. This is why I decided to do this in the first place =)
  • I can work in my backyard in the sun.
  • I have more flexibility about when and where to work. I appreciate this more than I thought I would.
  • Thinking about working remote as “a cool possibility with some ups and downs” instead of “this enemy that means I HAVE TO SEE LESS PEOPLE OH NO” helps me be happy instead of grumpy.
  • My happiness seems to be proportional to the amount of time I spend talking to people. This is something I can measure and optimize!
  • I’m getting better at asynchronous communication.
  • If I ask someone to do something when I finish work, they’ll be working for 3 hours after me! It might be already done when I start the next day.
  • 2 people on my team are remote! (colin and avi). This is a huge deal. If I were the only one it would probably be a disaster and I would be way more sad. As far as I can tell Avi’s been working remote approximately forever and he has a lot of good things to say.
  • I like that Stripe actually changes things to accommodate remotes (for instance: the all-hands meeting switched times so that it’s not at 7:30pm on Friday on the east coast)
  • Basically all of the discussion on my team happens over IRC/email. This means that there is a lot of IRC to keep up with. This is harder than I expected.

Strategies

  • I changed my work computer’s clock to be the time in San Francisco. This helps more than I expected.
  • I made a short URL (http://go/julia) that links to a Google Hangout with me
  • Deciding to be happy this summer. There is no reason to be sad in the summer.
  • Talking to other people who work remote sometimes and learning about things they do!

That’s all! Maybe there will be further updates.