Git's Core Data Model
It's not necessary to understand Git's data model to use Git, but it's very helpful when reading Git's documentation so that you know what it means when the documentation says "object" or "reference".
Your Git repository contains 4 main kinds of data:
- Objects: commits, trees, blobs, and tag objects
- References: branches, tags, remote-tracking branches, etc
- The index, also known as the staging area
- Reflogs
Objects: commits, blobs, trees, and tag objects
Commits, trees, tags, and blobs are all stored in Git's object database. Every object has:
- An ID, which is the SHA-1 hash of its contents.
It's fast to look up a Git object using its ID.
The ID is usually represented in hexadecimal, like
1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a. - a type. There are 4 types of objects: commits, trees, tags, and blobs.
- contents. The structure of the contents depends on the type.
Once an object is created, it's never changed. Here are the 4 types of objects:
commits
A commit contains:
- Its parent commit ID(s). The root commit has 0 parents, regular commits have 1 parent, merge commits have 2+ parents
- A commit message
- All the files in the commit, stored as a tree
- An author and the time the commit was authored
- A committer and the time the commit was committed
For example, here's a commit:
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647 <-- parent commit ID
author Maya <maya@example.com> 1759173425 -0400
committer Maya <maya@example.com> 1759173425 -0400
Add README
trees
A tree is how Git represents a directory. It lists, for each item in the tree:
- permissions 100644
- type blob
- object ID 8728a858d9d21a8c78488c8b4e70e531b659141f
- filename README.md
The files are blob objects and the directories are
tree objects.
For example, this is how a tree containing one directory (src) and one file
(README.md) is stored:
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
NOTE: The permissions are in the same format as UNIX permissions, but the only allowed file permissions are 644 and 755.
blobs
A blob is how Git represents a file. A blob object contains the file's contents.
Storing a new blob for every new version of a file can get big, so
git gc periodically compresses objects for efficiency in .git/objects/pack.
tag objects
Tag objects (also known as "annotated tags") contain:
- The tagger and tag date
- A tag message, similar to a commit message
- The ID of the object (often a commit) that they reference
- (optionally) A GPG signature
References: branches, tags, and more
References are a way to give a name to a commit. It's easier to remember
"the changes I'm working on are on the turtle branch" than "the changes
are in commit bb69721404348e".
Git often uses "ref" as shorthand for "reference", and refs are stored
in the .git/refs directory.
There are two kinds of references:
- References to an object ID, usually a commit ID
- References to another reference. This is called a "symbolic reference".
branches (.git/refs/heads/<name>)
A branch is a name for a commit ID. That commit is the latest
commit on the branch. Branches are stored in the .git/refs/heads/ directory.
tags: .git/refs/tags/<name>
A tag is a name for a commit ID,
tag object ID, or other object ID.
Tags are stored in the refs/tags/ directory.
Even though branches and commits are both "a name for a commit ID", Git treats them very differently. Branches are expected to be regularly updated as you work on the branch, but it's assumed that a tag will never change after you create it.
HEAD (.git/HEAD)
HEAD is where Git stores your current branch.
HEAD is normally a symbolic reference to your current branch, for
example ref: refs/heads/main if your current branch is main.
HEAD can also be a direct reference to a commit ID,
that's called "detached HEAD state".
remote tracking branch
(.git/refs/remotes/<remote>/<branch>)
A remote-tracking branch is a name for a commit ID.
It's how Git stores the last-known state of a branch in a remote
repository. git fetch updates remote-tracking branches. When
git status says "you're up to date with origin/main", it's looking at
this.
stash: .git/refs/stash
The stash is a single reference called stash. It's a reference to a
commit ID with the latest changes you stashed. The rest of the stash list
is stored in the stash reflog.
Other references
Git also has other references like ORIG_HEAD which are used for
various things.
The index
The index, also known as the "staging area", contains the current staged version of every file in your Git repository. When you commit, the files in the index are used as the files in the next commit.
Each index entry has 4 fields:
- The permissions
- The blob ID of the file
- The filename
- The number. This is normally 0, but if there's a merge conflict there can be multiple versions (with numbers 0, 1, 2, ..) of the same filename in the index.
It's extremely uncommon to need to look at the index directly: normally you'd
run git status to see what has changed since the last commit. But you
can use git ls-files --stage to see the index. Here's the output of
git ls-files --stage for our example repository.
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
Reflogs (.git/logs/refs/<name>)
Stores the history of every branch, tag, and HEAD. Not every ref is logged by default, but any ref can be logged.
This mirrors the structure of .git/refs: for example if you
have a branch in .git/refs/heads/<name>, its reflog will be in
.git/logs/refs/heads/<name>
Each line of the reflog has:
- Before/after commit IDs
- User who made the change
- Timestamp
- Log message, like "pull: Fast-forward"