There are plenty of guides to using git
all around the web. Why should I add another drop to a vast ocean of resources?
Well, as someone who’s professional experience with VCS has only involved Mercurial, I wanted to give myself a deeper understand of a tool I was going to be using daily. Furthermore, research shows that teaching others improves one’s understanding of the taught material.
When I was offered the chance to give a presentation on git
at work, I took it to build my own understanding, and, hopefully, my colleagues’ as well. While this won’t be the most comprehensive guide on git
, I wanted to highlight my learnings to myself and to you, dear reader.
This guide will assume you have basic understandings of why developers use
git
, though will still dive into "basic" topics that I find interesting from a technical or usability standpoint.
Common Workflows
Most developers use a single, centralized repository that all individuals push
and pull
from. But aside from that, there’s rarely a “one size fits all” solution to git
. Here are some basic approaches that I found in my research:
Gitflow
The Gitflow approach was pioneered in 2010 by Vincent Driessen at nvie and has wormed its way into enterprise software globally. The idea revolves around two core branches in your repo: main
(or master
) and some development
branch.
The development
branch is where all branches diverge from and are eventually re-merged. Then, whenever a new version of the software should be cut, the development
branch is merged into main
.
Effectively, main
is only ever touched by development
. This prevents any incomplete code from being checked into main
and makes the commit history (ideally) easier to parse, since each commit represents a new version of the product.
As mentioned in Driessen’s 2020 reflection about Gitflow, in the 10 years since, most software is continuously released, rather than released version-by-version. Subsequently, a simpler workflow can accomplish the same result.
Feature Branches (Preferred)
Feature branches are, for the most part, identical to the Gitflow approach, with one major difference: There isn’t a development
branch. By gating merge commits to main
with tests, developers can prevent broken code from ever making it into main
, which is a huge advantage for continuous integration and deployment environments.
Rather than merging from development
into main
whenever a release is necessary, commits in main
can be tagged as a release. This still provides the rigid versioning structure necessary for some releases without requiring the overhead of additional branches and special rules.
Forking
The forking approach breaks the single-remote mold. Each developer has their own remote that they are able to write to at their leisure (even pushing to main
!) This is often used for free and open-source (FOSS) projects, so that the maintainer doesn’t need to meddle around with permissions for contributers.
To understand forking, it’s first important to understand how remotes work. This was something I learned while researching this presentation:
Remotes are, true to their name, remote repositories to synchronize your changes with. While most developers use only one, typically hosted on something like GitHub or BitBucket, multiple remotes can split the management of a codebase more than just a branch can — by allowing completely different commit trees entirely!
With the forking workflow, developers create their own copy of the source repo to work in. They can always pull in changes from the source repo to their own copy, that way they’ll stay in sync with new features.
When they’re ready to commit a change upstream, they make a pull request (PR) where they can discuss their changes before merging in. In most cases, the maintainer of the source repo is the only one with write access and is therefore the “keeper” of features making it into the mainline.
Other Important Concepts
There were a few other topics that I learned along my journey from a “how does git
implement this” standpoint, even if I was already familiar with the terms and — in the case of branches, at least — how to use them.
Branches
These became a bit easier to visualize once I had a mappable concept in my head. Branches are nothing more than pointers to a specific hash in the commit tree. They move as you make commits, but they aren’t doing anything special, they just mark your progress.
In the above diagram, we can see a foo
branch based on commit B
and then with some work in W
added on top of it. When we create a new branch from main
, (also located at HEAD
, our currently checked-out branch) we are instantiating a new pointer to the same commit, and moving HEAD
to show we’ve checked it out.
Once we commit some more work to the bar
branch, we can see a new node in the tree, and both bar
and HEAD
have advanced to this new point in history. In this way, branches “grow”, much like those on a tree.
It can also be useful to run git
commands using HEAD
. Appending ~<number>
will run on a commit relative that number to HEAD
.
For example
git diff HEAD~1
shows you the differences between your staged files and those already committed.
Merging
While merging is a common practice, how does it actually work inside of git
? There are a few different types of merges, some simple, some more complex.
The first is a fast-forward
merge. This is the simplest type of merge, since it happens when there is been no divergence between the two branches. If you attempted to merge the bar
branch from the diagram above into main
, the main
pointer could just be shifted forward, since there hasn’t been any committed work to main
.
A three-way merge is more complex. We can see an example of one in action below:
In this example, there has been some work committed to main
while we have been working on feat
. Since we want to pull in those changes while we keep working on feat
, this is called “a merge of main
into feat
”.
Because the latest commit of the branch you’re on isn’t a direct ancestor of the branch you’re merging in, git
has to do some work. In this case, it examines the latest snapshot of each branch tip (C
and Z
) and their common ancestor — where they diverged in the tree — B
. git
then tries to determine what’s actually changed using a variety of merge strategies.
In some cases, git
can’t figure this out, and you have to resolve a merge conflict. But regardless, any changes to feat
are stored in new merge commit on the branch (M
). Future patches can still be applied onto feat
, as seen by patch N
.
It is also possible to apply work between branches using
git rebase
, though its main benefit is creating a cleaner, more lineargit
history. In my experience, being proficient withgit merge
will get you by and is a great place to start.
Merge Commits
Once you decide to perform a merge, now you have a choice of how you’d like git
to do it for you.
Merge commits provide granular, in-depth history, since they preserve the work from each commit in feat
when joined into main
. They are best used for bug-prone logic changes where bisecting commits can expedite detection.
Squash commits combine all of the work from feat
into a single commit onto main
. This means the history is much simpler — one big pile of changes — but can be harder to sift through. It’s best used for a series of like patches, such as style fixes or refactors.
Although there’s a big trade-off between each, I’ve heard Squashes favored more often than not. It may be that, like rebasing, a better git
history is rarely appreciated. Though, when it’s needed, it can be nice to have done the work and made your life easier.
Final Thoughts
There’s still a lot more I have to learn with
git
. Maybe another presentation and subsequent post are in order? All I know is that I look forward to deepening my knowledge ofgit
as I use it more both professionally and personally.