A few people have told me recently that they find the slides for my talks really helpful for getting started with pandas, a Python library for manipulating data. But then they get out of date, and it's tough to support slides for a talk that I gave a year ago.
So I was procrastinating packing to leave New York yesterday, and I started writing up some examples, with explanations! A lot of them are taken from talks I've given, but I also want to give some new examples, like
- how to deal with timestamps
- what is a pivot table and why would you ever want one?
- how to deal with "big" data
I've put it in a GitHub repository called pandas-cookbook. It's along the same lines as the pandas talks I've given -- take a real dataset or three, play around with it, and learn how to use pandas along the way.
Here's the current table of contents, as of right now. These links will probably break as I update it.
- Chapter 1: Reading from a CSV
- Chapter 2: Selecting data & finding the most common complaint type
- Chapter 3: Which borough has the most noise complaints? (or, more selecting data)
- Chapter 4: Find out on which weekday people bike the most with groupby and aggregate
- Chapter 5: Combining dataframes and scraping Canadian weather data
- Chapter 6: String operations! Which month was the snowiest?
- Chapter 7: Cleaning up messy data
- Chapter 8: Parsing Unix timestamps