Julia Evans

Some easy statistics: Bootstrap confidence intervals

I am not actually on a plane to Puerto Rico, but I wrote this post when I was :)

Hey friends! I am on a plane to Puerto Rico right now. When is a better time to think about statistics?

We’ll start with a confession: I analyze data, and I rarely think about what the underlying distribution of my data is. When I tell my awesome stats professor friend this, she kind of sighs, laughs, and says some combination of

  • “oh, machine learning people…”
  • “well, you have a lot of data so it probably won’t kill you”
  • “but be careful of {lots of things that could hurt you}!”

So let’s talk about being careful! One way to be careful is, when you come up with a number, to build a confidence interval about how sure you are about that number. I think the normal way to do confidence intervals is that you use Actual Statistics and know what your distribution is. But we’re not going to do that because I’m on a plane and I don’t know what any of my distributions are. (the technical term for not knowing your distributions is “nonparametric statistics” :D)

So, let’s say I have some numbers like: 0, 1, 3, 2, 8, 2, 3, 4 describing the number of no-shows for flights from New York to Puerto Rico. And that I also have no idea what kind of distribution this number should have, but some Important Person is asking me how much it’s okay to oversell the plane by.

And let’s say I think it’s okay to have to kick people off the flight, say, 5% of the time. Great! Let’s take the 5th percentile!

> np.percentile([0, 1, 3, 2, 8, 2, 3, 4], 5)
0.35000000000000003

Uh, great. the 5th percentile is there will be 0.35 people who don’t make the plane. This is a) not really something I can take to management, and b) I have no idea how much confidence I should have in that estimate, given that I only have 8 data points. And I have no distribution to use to reason about it.

Maybe I shouldn’t have switched to CS so I didn’t have to take statistics (true story). Or alternatively maybe I can BOOTSTRAP MY WAY TO A CONFIDENCE INTERVAL WITH COMPUTERS. If you’re paying close attention, this is like the A/A testing post I wrote a while back, but a more robust method.

The way you bootstrap is to sample with replacement from your data a lot of times (like 10000). So if you start with [1,2,3], you’d sample [1,2,2], [1,3,3], [3,3,1], [1,3,2], etc. Then you compute your target statistic on your new datasets. So if you were taking the maximum, you’d get 2,3,3,3, etc. This is great because you can use any statistic you want!

Here is some code to do that! n_bootstraps is intended to be a big number. I chose 10000 because I didn’t want to wait more than a few seconds. More is always better.

1
2
3
4
5
6
7
8
9
from sklearn.utils import resample
def bootstrap_5th_percentile(data, n_bootstraps):
    bootstraps = []
    for _ in xrange(n_bootstraps):
        # Sample with replacement from data
        samples = resample(data)
        # Then we take the fifth percentile!
        bootstraps.append(np.percentile(samples, 5))
    return pd.Series(bootstraps)

So, let’s graph it

1
2
3
data = [0, 1, 3, 2, 8, 2, 3, 4]
bootstraps = bootstrap_5th_percentile(data, 10000)
bootstraps.hist()

png

This is actually way more useful! It’s telling me I can oversell by 0 - 2 people, and I don’t have enough data to decide which one. I don’t know if I’d take this graph to airline executives (though everyone loves graphs right?!?!), but it’s for sure more useful than just a 0.35.

Thankfully in real life I would probably have more flights than just 8 to use to make this decision. Let’s say I actually had, like, 1000! Let’s start by generating some data:

data = np.random.normal(5, 2, 1000)
data = np.round(data[data >= 0]).astype(int)

Here’s a histogram of that data:

pd.Series(data).hist()

png

Now let’s take the 5th percentile!

np.percentile(data, 5)
2.0

Again, I don’t really feel good about this number. How do I know I can trust this more than the 0.35 from before? Let’s bootstrap it!

bootstraps = bootstrap_5th_percentile(data, 10000)
bootstraps.value_counts().sort_index().plot(kind='bar')

png

I feel a little better about calling it at 2 here.

The math

I have not explained ANY of the math behind why you should believe this is a reasonable approach, which if you are like me then you are super uncomfortable right now. For instance, obviously if I only have 1 data point, sampling with replacement isn’t going to help me build a confidence interval. But what if I have 2 points, or 5? Why should you take these histograms I’m building seriously at all? And what’s this business with not even caring about what distribution you’re using?

All worthwhile questions that we will not answer here today :).

Be careful

If occasionally 100 people don’t make the flight because they’re all from the same group and that’s important and not represented in your sample, bootstrapping can’t save you.

This method is the bomb though. It is basically the only way I know to get error bars on my estimates and it works great.

AdaCamp Montreal 2015

I went to AdaCamp these last couple of days. I want to talk about some of the awesome stuff that happened!

AdaCamp is an unconference, which means that people decide what the sessions will be about on the first day of the conference. Here are some things I’m thinking about!

Testing

I went to a really, really interesting session about software testing by someone who works as a software tester. I work as a developer, and I’ve never worked with a QA team! I didn’t know there were people who specialized in testing software and were really awesome at it who don’t write programs! This was super cool to learn. I still don’t know how to think about separating out the responsiblities of writing the software and verifiying the software – obviously individual developers also need to be responsible for writing correct software, and it still feels strange to me to hand any of that off.

But Camille Fournier told me on twitter about user acceptance testing and how you can have a QA team that checks that the software makes sense to users and, like, talks to them and stuff, not just software that’s theoretically correct. So that’s pretty cool.

Awesome people

I met a lot of really interesting people! I met sysadmins and people who had been programming for a long time and software testing and people who know a lot about science fiction and ham radio and bikes and publishing and zines and Quebec and libraries and Wikipedia (someone wrote their dissertation on Wikipedia. Wow.). I learned SO MUCH about Wikipedia. And almost all of those people identified as women! A++ would meet delightful people again.

Codes of conduct

This session convinced me open spaces are a good idea.

Initially I didn’t want to go because I was interested in some very specific aspects of codes of conduct (deescalating situations + how to make CoCs less intimidating to people who are genuinely good intentioned but not familiar with a given community + when to model behavior implicitly vs writing down explicit rules). And I told someone during a break that I didn’t want to go to the session because I thought people wouldn’t be discussing the thing I wanted to talk about.

And she said AWESOME. THOSE ARE AWESOME THINGS TO TALK ABOUT. COME WITH ME AND WE WILL TALK ABOUT THAT. And we did! And I don’t have answers about any of those things, but I got to hear some new perspectives and stories and now I know a couple more things. And the other people seemed to think the questions I had were interesting <3.

And it made me remember – when I think that I’m the only person who has a given concern or question or experience, I’m usually wrong :)

Moderation

There were a lot of unstructured discussion sessions at AdaCamp. This was really cool, because it means you can cover a lot of ground. I also was reminded again of how important good moderation + faciliation is, and how much I want to get better at it. I’m working on learning how to:

  • create some explicit structure around a session (“let’s discuss these 4 topics, and spend ~15 minutes on each one. does that sound good?”)
  • tell someone when they’ve said enough <3 (“thanks so much! I’d love to hear from some people who haven’t said as much yet”)
  • move the discussion back on track if it’s veered away (“okay awesome! Does anyone have anything else to say about $topic, or should we move on to $next_thing?”)

People take up really incredibly different amounts of space in discussions, and I really really want to get better at making sure people who are quieter get a chance to say their super interesting things. Interrupting people is hard for me!

After AdaCamp I felt like there are a lot of great people in the world who are trying their best to do what’s right and have a lot of good ideas about how to do that and want to have the same conversations that I want to have. A little more than usual =)

A zine about strace

in strace

As some of you might have heard, I wrote a zine to teach people about how to debug their programs using strace a while ago! I was originally going to mail it out to people, but it turns out I’m too lazy to mail anything.

So instead, you can download, print, fold, and staple it yourself today! It should work if you print it double-sided with short edge binding on letter paper. Also if you print an initial master copy, you can take it to a copy shop and get them to make many copies for you.

Give it to your friends/colleagues/students to teach them about strace! Send me pictures! Tell me what you think! <3

Here’s the pdf. Have fun. (there’s also a landscape version)

Learning at open source sprints (no preparation required)

in opensource

I’m someone who isn’t heavily involved in contributing code to OSS, and normally go to sprints just to learn something new, and not with any particular goals. This has never worked out that well for me, but I had a new idea yesterday! Maybe if you’re like me it will help you.

I was talking to someone yesterday about contributing to a relatively complicated open source project during the sprints. And we worked out that they were super super interested in learning about the project’s internals, and didn’t necessarily need to contribute to the project!

Contributing to a project for the first time is hard. There’s a lot you need to know! And the area of the project you’re interested in might not necessarily need contributions right now! Or you might not be able to make the contribution you’re interested in on your first 3 days.

So we came up with an AWESOME IDEA. Instead of trying to write a contribution, change the goals! The next time I go to a sprint, I think I’ll just

  1. pick a project I’m interested in
  2. decide on a thing I’d like to learn about that project
  3. start digging into the project, running code, and learn about that thing
  4. not worry about contributing

For me, I think this could be way more fun and that I’d learn a lot more. And it would be fun to do this stuff near the team who works on the project, regardless of whether or not they have time to help me :)

And maybe, while exploring with no particular goal, I’d find a change that needs making! :) Or if I wanted to contribute to the project 6 months down the road, I’d be a little more prepared to do that.

A few spy tools for your operating system (other than strace!)

There are so many awesome tools you can use to find out what’s going on with your computer. Here are some that exist on Linux. They might exist on your OS too!

netstat

netstat tells you what ports are open on your computer. This is crazy useful if you want to know if the service that is supposed to be listening on port 8080 is actually listening on port 8080.

1
2
3
4
5
6
7
sudo netstat -tulpn
[sudo] password for bork: 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address       Foreign Address     State       PID/Program name
tcp        0      0 127.0.0.1:631       0.0.0.0:*           LISTEN      1658/cupsd      
tcp        0      0 127.0.0.1:5432      0.0.0.0:*           LISTEN      1823/postgres   
tcp        0      0 127.0.0.1:6379      0.0.0.0:*           LISTEN      2516/redis-server

If you look at the Program Name column on the right, you’ll see that apparently I have cupsd (printing), postgres, and redis servers running on my machine, as well as some other stuff that I redacted. I actually have no idea why I had redis installed so uh yeah I uninstalled it.

I use netstat pretty often when I’m trying to debug “omg why is this thing not running IT IS SUPPOSED TO BE RUNNING”. netstat tells me the truth about whether it is running.

Seeing system calls with perf instead of strace

in perf

I’m at a local hackerspace this evening, and I decided to get perf working on my computer again. You all know by now that I’m pretty into strace, but – strace is not always a good choice! If your program runs too many system calls, strace will slow it down. A lot.

Let’s try it and see:

1
2
3
4
$ time du -sh ~/work
0.04 seconds
$ time strace -o out du -sh ~/work
2.66 seconds

That’s 65 times slower! This is because du needed to use 260,000 system calls, which is uh a lot. If you strace a program with less system calls it won’t be that big of a deal. But what if we still want to know what du is doing, and du is actually a Really Important Program like a database or something?

WE’RE GOING TO USE PERF =D =D.

I’ve been eyeing Brendan Gregg’s page on perf and the kernel.org tutorial for almost a year now, and we learned in May last year that perf lets you count CPU cycles, which is cool! But perf is capable of way more stuff.

Here’s how we record what system calls du is using:

1
sudo perf record -e 'syscalls:sys_enter_*' du -sh ~/work

This finishes right away, except that perf takes a little extra time to write its recorded data to desk. Then we can see the system calls with sudo perf script, which shows us something like this:

1
2
3
4
5
6
du 25156 [003] 142769.540801: syscalls:sys_enter_newfstatat:
       dfd: 0x00000006, filename: 0x021b0b58, statbuf: 0x021b0ac8, flag: 0x0
du 25156 [003] 142769.540802: syscalls:sys_enter_close:
       fd: 0x00000006
du 25156 [003] 142769.540804: syscalls:sys_enter_newfstatat: 
       dfd: 0x00000005, filename: 0x021b4708, statbuf: 0x021b4678, flag: 0x0

This is showing us system calls! You can see the file descriptors – fd: 0x00000006. But it doesn’t give us the filename, just… the address of the filename? I don’t know how to get the actual filename out and that makes me sad.

It’s called perf script because you can write scripts with the output (like this flamegraph script!). Like maybe you could pretty it up and have a script that’s like strace but doesn’t slow your program down so much. Apparently perf script -g python will automatically generate boilerplate for a perf script in Python for me! But it doesn’t work because I need to recompile perf. So we’ll see about that :)

That’s all I have to say for now! Mostly I’m writing this up in the hopes that someone will either a) tell me how to get perf to give me the actual filename or b) tell me why it’s unreasonable to expect perf to do that.

Senior engineering & fantasy heroes

I was talking to someone at work this past week about what I’d want out of a senior engineer, and found myself inventing characters I’d like to work with (and I already work with people who remind me of all of these, of course! <3). Maybe someone will find this bit of silliness enjoyable :). It’s about how fortune tellers do not necessarily also need to be cattle wranglers.

(apparently I think gardeners are fantasy heroes)

In very related excellence, Camille Fournier posted Rent the Runway’s engineering ladder in this blog post and spreadsheet which lays out engineering qualities they value in terms of strength/dexterity/wisdom/charisma <3

The fortune teller

The fortune teller can tell the future about your engineering project. You tell her a design decision you’re making; she tells you the problems you’re going to run into in 3 months. She saves you an incredible amount of engineering effort in bad directions.

The cattle wrangler

You have a team, and you need to standardize how your programs do an Important Thing. Everyone wants to standardize, and nobody can agree on what the standard should be. The cattle wrangler is amazing at working through the pros and cons, and getting everyone to feel heard & agree on a standard.

The spring of knowledge

Your company uses a lot of Java, and sometimes you need to know some obscure internal JVM detail. And all of your internet searching is bringing up… nothing. When you do, you go to the spring of Java knowledge, which tells you what you need to know.

(What you need to know is not always the answer to the question you asked)

The gardener

You built a project full of technical debt and spiky bits? You go to the gardener for help, and sheepishly ask them to help you clean it up a bit. They show you where the nastiest weeds are, suggest code that you could delete, and help you get to a better architecture in a reasonable amount of time. They’re great to have on your side at the beginning of a project, before you create the technical debt in the first place :)

If you have more characters you work with & love, tell me! @b0rk on Twitter.

Nancy Drew and the Case of the Slow Program

in systems

Yesterday I tweeted:

I specifically wanted programming-language-independent ways to investigate questions like this, and I guess people who follow me on twitter get me because I got SO MANY GREAT ANSWERS. I’ll give you a list of all the answers at the end, but first! We’re going to mount an investigation.

Let’s start! I wrote up 3 example mystery programs, and you can find them in this github repository.

Mystery Program #1

Let’s investigate our first mystery slow program!

You can choose who submits talks to your conference

in conferences

Sometimes I see conference organizers say “well, we didn’t have a choice about the talk proposals we got!” or “we just picked the best ones!”. I think we all know by now that that’s bullshit, but just in case – it’s bullshit! =D

We have a choice about who submits talk proposals, and also about who submits the best talk proposals. I watched somebody I know get talk proposal feedback today, and their proposal started out good and got dramatically better. Now it’s great.

If you ask someone specifically to consider speaking at your conference, they’re WAY more likely to consider submitting a talk than if you don’t. If you then actively work with some talk submitters to help them focus and improve the talk they submit, their proposals will get better! And if you choose to focus your energies to work with (for instance) non-white people more than white people, then you’ll get more and better proposals from people who aren’t white.

You can see this with PyCon! 30% of the talks at last year’s PyCon were women, because lots of people have done tons of individual outreach to encourage their friends to give talks and spent lots of time working with them to write good proposals. As Jessica McKellar says:

Hello from your @PyCon Diversity Outreach Chair. % PyCon talks by women: (2011: 1%), (2012: 7%), (2013: 15%), (2014: 33%). Outreach works.

This makes me really happy! It means that if I’m working on a conference (like !!Con), then I know I can help get more diverse participation by sending emails to individual people who I’d like to hear from. Telling people that I like their work and that I’d like for them to talk about it is super fun! (and true!)

If there’s something you find exciting about programming and you often find you’re part of an underrepresented group when you go to conferences, I’d love it if you submitted a talk to !!Con. Double especially if you live in NYC!

1:1 topic ideas

Danielle Sucher started this great thread on twitter asking for ideas for what to talk about in 1:1s with your manager. I’m writing some of them up here so I don’t forget.

  • What’s happening now that I would like to not be happening in a month? (@zmagg)
  • Am I having tension with any of my colleagues I want to resolve before it gets worse?
  • what are promotions for? where am I relative to that, and what should I be working on?
  • or: “This is how I would like promotions to work when they happen. How would I fit in to that if they did?”
  • turn it around: what are you thinking about right now? what’s your top priority? what’s worrying you about this team?
  • Am I happy with my current project?
  • Do I feel like I’m learning? Are there things I feel like I’m not learning that I would like to?
  • Are there things about the way the team is working together that feel bad to me?
  • periodically: where do I want to be with my career?

There’s this further list of 101 questions to try that I find really really helpful as an exhaustive grab bag of “oh no I don’t know what to talk about give me ideas please!!!”.