Welcome back to fun with databases! In Part 1 of this series, we learned that:
- SQLite databases are organized into fixed-size pages. I made an example database which had 1k pages.
- The pages are all part of a kind of tree called a btree.
- There are two kinds of pages: interior pages and leaf pages. Data is only stored in leaf pages.
I mentioned last time that I put in some print statements to tell me every time I read a page, like this:
Let's understand a little bit more about how these btrees work! First, some theory.
Normally when I think about tracking tree accesses, I think about it in terms of "how many times you need to jump to a new node". So in a binary tree with depth 10, you might need to jump up to 10 times.
One of the most important things in database optimization is disk I/O. Reading more data than you absolutely need to read is really expensive, because seeking to a new location in a file takes a long time. It takes way less CPU time to search through your data than it does to read the data into memory (see: Computers are fast!, Latency Numbers Every Programmer Should Know).
So for a simple database scan, reading more data than you need is a huge problem
So far we know that:
- Our database is organized into 1k pages
- We want to read as little of the database as possible to write a query, and
- My filesystem has a "block size" of 4k, which means that it's impossible to read less than 4k at a time.
We can conclude from this that we want to read as few pages as possible. btrees are organized so that each node has lots of children, to keep the depth small, and so that we won't have to read too many pages to find a row.
My 100,000 row SQLite database has a btree with depth 3, so to fetch a node I only need to read 3 pages. If I'd used a binary tree I would have needed to do log(100000) / log(2) = 16 seeks! That's more than five times as many. So these btrees seem like a pretty good idea.
The index and table btrees
So far we've been talking like there's only one btree. This isn't actually true at all! My database has one table, and two btrees.
Each table has a btree, made up of interior and leaf nodes. Leaf nodes contain all the data, and interior nodes point to other child nodes.
Every index for that table also has its own btree, where you can look up which row id a column value corresponds to. This is why maintaining lots of indexes is slow -- every time you insert a row you need to update as many trees as you have indexes.
Let's dive into a query a bit more deeply. It turns out SQLite does a binary search inside every page of every btree to find out what node to go to next. I've printed out the indexes it tries while doing the binary search.
Wow, that was a lot of work. There were actually two separate searches here, in two different btrees.
- Look up the rowid associated with the number
1000in the index btree --
99001. (some rowid docs).
- Look up
99001in the table btree.
It's really neat to see it doing the binary search for
each page! I couldn't quite figure out how to get it to print out the
comparisons it was doing when looking up 1000 in the index btree,
because it does some weird function pointer magic to do comparisons.
Kamal is parsing SQLite databases with the Python construct module next to me right now and it is AMAZING. There may be future instalments. Who knows!