Part 2 - Collaboration
Part 3 - Rebase and Bisect
Welcome to part one of Erik’s Code Space’s first article series of 2021! In this series we’re going to learn all about version control using Git and GitHub. In this first part of the series, we’re going to talk a little about the idea of version control, then jump right in to hands-on usage. This is the kind of thing that’s going to help you right away, and it’s the sort of skill that employers are going to expect you to have. Without further ado, let’s jump right into it.
What is Version Control?
In software development, version control refers to the things we do to manage and organize different versions of the projects we’re building. This can mean things like maintaining release versions and old versions for your customers to use, but for most programmers, version control is about keeping a record of the changes you make to the code you’re writing. It serves 3 purposes:
- Keeping a record of changes and decisions made to your code
- Providing a safety net for experimentation
- Allowing collaboration on projects all over the world
We’re going to get into how we can use version control to do all these things in this series. For right now, let’s talk about the different kinds of version control
Version Control Softwares
There are many different products out there for you to use. Git is by far the most popular tool for version control and is the tool we’ll be using for this series because it’s free and because it’s the most likely thing you’ll use at work. There are other version control products out there that get plenty of use in the professional world such as Subversion and Mercurial. I’m not sure if these tools have niche industries or domains, but you can’t go wrong by knowing Git.
Along with Git, there are two major Git servers-as-a-service: GitHub and GitLab. Both have a lot of services for collaboration, continuous integration (CI), continuous deployment (CD), versioning, sharing, issue tracking, and more. For our purposes, we are going to stick with GitHub since this is the service the author knows best. However, we will not be using GitHub in part one of this series, so don’t worry about it for now. But, if you want to save some time, go ahead and make yourself a GitHub account so that you’re ready for part 2.
Installing Git
Alright, we’re ready to get started. First we have to install Git on our computers. It’s very possible you’ve already got Git on your computer, so let’s check and see if that’s the case real quick. Open up your command line of choice and run the following command:
> git --version
If you have Git on your computer already, you will get something like
git version 2.16.2.windows.1
as the output. If not, you have two options depending on what operating system you’re on. For Linux and Mac users, you can simply go to the terminal and type:
> sudo dnf install git-all
If you’re on Windows, you’ll have to head over to gitforwindows and follow the download instructions there. Don’t worry about installing any GUI programs. In my experience, Git is much more intuitive once you get used to using it from the command line than any GUI program (but if you must use one, I recommend GitHub desktop).
Once you’ve installed the appropriate version for your OS, reload your terminal and type in the git --version
command again to make sure git is in your path. Once you’ve done that, you’re ready to start using it!
Git Basic Concepts
We use git to keep a record of what we’ve done in our software projects so that we can see every change we’ve ever made and keep a running log of what we did. Starting out, it’s best to think of git as a tool for keeping a timeline. Remember those timeline graphs in your 6th grade history textbooks? They looked kind of like this:
See how there’s a line depicting time, and an arrow and blurb about every significant event that happened during this (presumably made up) company’s history? Git creates something like this, but for our software projects. The solid line is kind of like Git’s “branch” and the messages along the line are a lot like Git’s “commits.”
Basically, we start off by initializing a Git repository. In the company timeline metaphor, the repository is the actual company. It’s the body of code of which we are keeping a running record.
Then, you start working on the repository. You might kick things off by creating a getting_started.py
Python script. You begin by writing the code for that file until you get to a reasonable stopping point. You look over the code you wrote and, if everything looks good, you decide to commit the file to the repository.
To commit the file, you also have to write a small message. Something like “Wrote proof of concept script.” Once you make the commit, that message becomes the first bit of text on the timeline of your software project. That timeline is called a branch.
At pretty much any given time, you are at the right-most part of the branch you’re working on. Every commit you make adds another event to the timeline. Luckily, we can spawn branches off of other branches. For example, on your current timeline, you may decide you want to experiment with creating a new file like file_two.py
but it’s not quite ready to be an official part of the project history. In this case you’ll create a new branch off of the official branch (often named something like main, master, or trunk). In this way, you can experiment with an alternate timeline. If you like the work you did on the new branch, you can then merge that branch into main. This will make all of the work you did on the alternate branch part of the official history of main.
These are the core concepts behind how Git works. There’s a lot more to it but don’t worry, We’re about to get hands-on with this tool and it’ll all make a lot more sense.
Initializing Your Repository
Go ahead and open up your terminal of choice, whether it’s bash, DOS, PowerShell, or something else. Find your way into a folder that you use for projects and make a new directory called git_project
and cd into it. I’m doing this on PowerShell, so it looks like this:
> mkdir git_project
> cd git_project
Now, we’re going to initialize our repository. This basically means we’re telling Git that everything in this directory is a project that we want to keep track of. We’re going to name our primary branch “main,” so the command to do that is:
> git init -b main
You should get a message saying “Initialized empty Git repository in , and if you look, there’s a new directory here called .git
. Don’t mess with it, I just want you to know that it’s supposed to be there. Also, the directory from which you ran this command is now the project’s root.
Adding Files
Ok, now that we’ve got our repository initiated, it’s time to start writing some code for Git to track. Let’s create a Python file called getting_started.py
. Since we’re in the command line, I like to use commands to do this, but you can do it from explorer, or an IDE, or anything else you want. Either way, create the file and open it in your editor. In Linux I use
> touch getting_started.py
Or in PowerShell on Windows:
> echo '' > getting_started.py
Since this isn’t a Python tutorial, we’ll keep the code we write to a minimum and very simple. We’ll write a very short function called greet
that looks like the following:
def greet(name):
return f'Hello, {name}!'
Now, let’s head back over to our terminal. Type in the command git status
and take note of the output. This is what mine looks like, yours should be fairly similar:
> git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
getting_started.py
nothing added to commit but untracked files present (use "git add" to track)
Notice that getting_started.py
is listed under “untracked files.” This is because we haven’t yet told Git that this is a file we want to keep track of. In order to do this, we have to add the file to the repository. The command to do so is probably exactly what you think:
> git add getting_started.py
Type in the above command and hit enter. We’ve now officially added getting_started.py
to the repository and Git will track it’s changes from now on. Go ahead and run git status
again and note the new output:
> git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: getting_started.py
See now that getting_started.py
is listed under “Changes to be committed.” This is a pretty nice segue into our next command, commit
.
Committing Changes
We’ve now decided that we’re happy enough with our greet
function in the getting_started
script. It’s time to commit these changes, or make them an official part of this project’s history. Like I said earlier, we’ll have to come up with a short message to describe what we’ve done. All we’ve really done was create the greet
function, so let’s go with that. The command to commit our changes with this message is as follows:
> git commit -m "Create greet function"
You will get some output about how many files were changed and some other stuff that you don’t really need to pay much attention to for now. Let’s see what happens when we run the status command again:
> git status
On branch main
nothing to commit, working tree clean
In committing our work, we’ve made this file and function an official part of the project’s history. As such, there are no “changes” to commit when we check the status of the repository. Let’s go ahead and make a few more changes to getting_started.py
though, I want to add a new function called double_this
. The function will look like this:
def double_this(number):
return number * 2
Make sure to save your changes and then head back over to the terminal. Let’s run the status command again and see what our repository looks like:
> git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: getting_started.py
no changes added to commit (use "git add" and/or "git commit -a")
Notice this time that getting_started.py
is not listed under “Untracked files” but “Changes not staged for commit.” That’s because whenever we change a file, we have to run git add
again to stage the changes before we can commit them. In other words, once we’re ready to commit our changes, we have to first run the add
command and then the commit
command again. Don’t run the following commands (I’m about to show you a better way) but this is what they would look like:
> git add getting_started.py
> git commit -m "Added double_this function"
Most of the time, however, you don’t stage changes until you’re ready to commit them. Because of this, there’s a short hand we use to both stage the changes and commit them at the same time, by passing the letter a
as a command line argument in your commit
command. The above example can be condensed into a single command that looks like this:
> git commit -am "Added double_this function"
With this command, we simultaneously staged our changes and committed them. (Note: This will not work for “untracked files” which you have to manually add to the repository with git add
) Personally, this is the way I do commits 99% of the time. Now, “Added double_this
function” is officially part of our project’s history. But what is this history I keep talking about? Well, it’s really more of a log, let’s talk about it:
Seeing Your Project’s History
Remember how I said the Git repository is like a timeline for your project? Well, we can view that timeline with git log
, which can show us our entire history, take a look:
> git log
commit bfae576bca086e682eae6ca0a7b12d23621234a8 (HEAD -> main)
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:01:47 2021 -0500
Added double_this function
commit 47308246e7a8dbfe6fd3b4361f28c4a1e6643f79
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 19:50:01 2021 -0500
Create greet function
There’s a few things that might look a little foreign in this log, so lets go through each one at a time. First, we see “commit
” with a hash. This is, unsurprisingly, the commit hash. It’s unique to your whole project and there are commands we’ll learn later in which you will use this hash. Well, you’ll actually only use the first few characters of the hash as Git is usually smart enough to figure out the rest.
Next, you see “Author.” This is your Git username and you can see it by running git config user.name
or set it by running git config --global user.name "new-name"
if you need to. The same goes with the email address that follows.
The “Date” field is obvious enough, but notice that the next line is the commit message we wrote earlier. This is what I mean when I say Git keeps a history of your project. The Git log is important for several reasons, and it’s imperative that you write good commit messages. Many times, the Git log is used by other members on your team to see what work was done most recently. Other times, people scrolling through your project might read through it to get a feel for what your project is about and how you work.
Most importantly though, it’s a way for you to look back on your project and see what changes you made at any given point. One day you might be staring at a weird piece of code and think to yourself “why in the world did I do that?” Using Git, you can actually go back and see when you wrote that code, what code was there before, and your own commit message. This is a very useful tool and you never know when it might come in handy.
Undoing Changes
Let’s talk about another thing Git helps us with: experimentation. Sometimes, you have a big idea and you want to see if you can make it work in your project. You don’t want to mess up the project though, so you make sure you start from a clean commit (i.e., no outstanding changes, staged or otherwise). At some point, you decide that the changes you wanted to make are either too complicated or aren’t going to work. With Git, you can blow everything away, no matter what you changed, and go back to exactly the way things were before you started. Let’s see this idea in action.
Open up the getting_started.py
script again and lets mess with the double_this
function by making it use a float instead of an integer:
def double_this(number):
return number * <strong>2.0</strong>
Save your changes and go back to the terminal, run git status
to make sure getting_started.py
is listed as “Changes not staged for commit”
Say that now that we’ve thought about it, we like doubling with the integer instead of the float. We could go back and just press Ctrl+z until things go back to the way they were, but as changes start to get bigger, approaches like that don’t work. Luckily, there’s a single line command in Git to undo all the unstaged changes, run the following command:
> git checkout getting_started.py
Now go back to getting_started.py
in your code editor. Notice anything? The double_this
function should have returned to it’s original state. Isn’t that convenient? Even if you have multiple changes spanning across several files, as long as those files are tracked, the checkout
command will return them to their most recent commit.
By the way, you can apply git commands like add
and checkout
to multiple files instead of running one command per file. Most of the time, I just run
> git add .
to stage all changes at once. You can also specify which files you want the operation to be run on in a single line like this:
> git checkout file1.txt file2.py file3.sql
Finally, you can also use regex-like commands. For example, say you want to stage changes made to any Ruby files, you’d write:
> git add *.rb
Reverting Changes
Sometimes you decide that a change you made was entirely wrong, but you’ve already committed it. The checkout
command won’t work for undoing those kinds of changes, but there is an option, it’s called revert.
Reverting our changes means undoing an entire commit. Let’s say that now we don’t want the double_this
function in our script at all. Sure, we could delete it manually and make a new commit with the message “deleted double_this.” But if we wanted to surgically remove all the work we did, we could use the revert command. The first thing you have to do is find the hash of the commit you want to undo, remember that we get that from the log
command. Check it out:
> git log
commit <strong>bfae576bca086e682eae6ca0a7b12d23621234a8</strong> (HEAD -> main)
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:01:47 2021 -0500
Added double_this function
commit 47308246e7a8dbfe6fd3b4361f28c4a1e6643f79
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 19:50:01 2021 -0500
Create greet function
We want to revert the commit with the message “Added double_this function.” Find the commit hash, for me it’s “bfae576bca086e682eae6ca0a7b12d23621234a8” but it will be different for you. To revert, run the following command, but replace my hash with yours (note, you can use just the first 5 or 6 characters of your commit hash):
> git revert --no-edit bfae576bca086e682eae6ca0a7b12d23621234a8
And press enter. Go back to your code editor and note that double_this
is now gone. Go back to the terminal and run git log
to see what the history looks like:
> git log
commit 275c72742cf806d2517661d5ae8f2d9e516efed2
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:51:56 2021 -0500
<strong>Revert "Added double_this function"
This reverts commit bfae576bca086e682eae6ca0a7b12d23621234a8.</strong>
commit bfae576bca086e682eae6ca0a7b12d23621234a8
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:01:47 2021 -0500
Added double_this function
commit 47308246e7a8dbfe6fd3b4361f28c4a1e6643f79
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 19:50:01 2021 -0500
Create greet function
Notice that we still keep the history. We have “Added double_this function” in the Git log, but we also have “Revert ‘Added double_this function.'” The “Revert …” commit message is actually the default commit message when running the revert
command, which is what we get when we pass the --no-edit
flag. If you wanted to write your own message, leave that flag out. Git will ask you what program you want to use to write your commit message (I usually use notepad++ on Windows and Nano on Linux). Once you’re done editing the file, close the program and the commit will be made.
Those are the basics!
You now know how to initialize a repository, track files, commit changes, undo changes, and revert to a previous commit. That concludes the basic ideas behind Git operations, now let’s dive into branching.
Git Branches
One of the best things about Git is branching. It lets us essentially live multiple histories at a single time without disturbing the true history until we’re ready. We can experiment, test out new features without changing a production system, and collaborate with other developers.
To illustrate the most typical use of branching, I will describe how I typically handle changes to a website I maintain. It’s a pretty simple website with the company’s name on it, and a few pages describing their team, a catalogue of their products, and a signup form for their mailing list. Let’s imagine someone finds a bug in the website, like the the form isn’t saving the person’s name properly.
Instead of disrupting operations by poking around and the company’s public facing website, I make a new branch off the main
branch so that I can investigate the source of the bug and fix it. When I make a new branch from the main one, I essentially copy it: everything from the code to the commit history. But as I make changes to the new branch, the main one stays as it is. I can even switch back to it if a new, more important thing needs addressing, and then switch back to my other work when that’s done.
When I’m done investigating the source of the signup form bug and apply the changes, I then merge the changes onto the main branch. That’s when everything I’ve done on the new branch becomes one with the main branch. Let’s go ahead and see this in action.
Making a New Branch
Within our git_project
repository, we decide we want to make a new function once again, but we don’t want it to be on the main branch until we’re done with it. First, what we do is create a new branch with the branch
command. Then we checkout that branch so that the work we do gets tracked in the new branch’s history rather than the main one.
First, go to your terminal and see what branch we’re on by typing git branch
. The output should have the work “main” in green with an asterisk next to it. The asterisk indicates which branch you’re on (being “on” a branch means that the changes you make are being tracked and recorded in that branch’s history). Let’s make a new branch with the branch
command:
> git branch add-new-method
This command creates the branch, now we need to checkout the branch. Once we run the next command, we’ll be on the add-new-method
branch:
> git checkout add-new-method
And the terminal should output “Switched to branch 'add-new-method'
“. In the future, if you would like to create a branch and check it out in one command, you use the checkout
command with a -b
argument like so:
> git checkout -b add-new-method
Ok, so now that we’re on the add-new-method
branch, I want you to see something. In the terminal, run git log
and see the output. It should be the exact same as it was last time you looked at it on the main
branch. In fact, until we commit anything in either branch, their log will be the same.
Let’s go ahead and add some code. Let’s make a new method called triple_this
which will do pretty much what you’re probably expecting:
def triple_this(number):
return 3 * number
Let’s go ahead and commit these changes as well. In the terminal type the following command:
> git commit -am "Add triple this function"
Now, we’ve committed this work to this branch’s history. Check out the output from git log
:
commit 65ed67a53aeb94b27865e401cc456e6b2b33bf2a (HEAD -> add-new-method)
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 22:55:58 2021 -0500
Add triple_this function
commit 0ea7a943b604f5efebaecb7e2ede33cf00cf7de2 (main)
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:54:16 2021 -0500
Revert "Added double this"
This reverts commit ae2a9fe016ab88a07dc6b06b9ca4417dd3f0933e.
commit ae2a9fe016ab88a07dc6b06b9ca4417dd3f0933e
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:53:37 2021 -0500
Added double this
...
(truncated)
The commit about adding the triple_this
function is in this branch’s history. Let’s switch back to the main
branch, this is going to make it all make more sense. In the terminal type:
> git checkout main
First of all, go look at the getting_started.py
script and notice that the triple_this
function is not there. That’s because that work was on the add-new-method
branch, not the main one. Additionally, run git log
again and notice that the commit for triple_this
isn’t in there.
Another Branch
To illustrate another point I want to make later, let’s make yet another branch. This time, we’re going to edit the greet
method. So in the terminal, make sure you’re on the main
branch by typing git branch
and ensuring main
is starred. Now, type in the following command:
> git checkout -b edit-greet-method
We’re now on the edit-greet-method
branch, so head into the getting_started.py
script and set the greet
method to look like this:
def greet(name):
return f'What\'s up, {name}?'
Now let’s go ahead and commit this change:
> git commit -am "Edit greet function"
Once again, check the log and notice that the only commit message after our revert of the “Added double this” commit is the one we just made: “Edit greet function.” Switch back to the main
branch by typing git checkout main
in the terminal and once again note that the most recent commit message in the log is the one about reverting the double_this
method. Also note that in the getting_started.py
script, the method is how we left it, with the “Hello” greeting instead of “What’s up.”
Merging Branches
Ok, now that we’ve made changes on two different branches and we think we’re happy with them, it’s time to merge those changes into our main
branch and unifying the histories. First, make sure you’re on the main
branch. First, we’re going to merge in the edit-greet-method
. In the terminal type:
> git merge edit-greet-method
You’ll get some output about fast-forwarding and how many files were changed. Look at the script and notice that the greet
function is now using the “What’s up” greeting instead of “Hello.” In the terminal, check out the log:
> git log
commit ad0ca3092e6e49a86d8f228d2b9e7f39b8a559ca (HEAD -> main, edit-greet-method)
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 23:08:03 2021 -0500
Edit greet function
commit 0ea7a943b604f5efebaecb7e2ede33cf00cf7de2
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:54:16 2021 -0500
Revert "Added double this"
This reverts commit ae2a9fe016ab88a07dc6b06b9ca4417dd3f0933e.
commit ae2a9fe016ab88a07dc6b06b9ca4417dd3f0933e
Author: erik-whiting <erik@erikwhiting.com>
Date: Thu Mar 25 20:53:37 2021 -0500
Added double this
...
(truncated)
Now we see that the commit history (equivalent to the notes from our timeline example from earlier) from the edit-greet-method
branch is now unified–or merged–with the commit history of main
. Essentially, we’ve made the work we did on the edit-greet-method
branch part of the official history.
Let’s go ahead and do the same for the add-new-method
branch. We created that branch before we created the edit-greet-method
branch though, do you think that’s going to cause problems? Let’s find out. Run the following in the terminal:
> git merge add-new-method
You will probably get output similar to the following:
warning: Cannot merge binary files: getting_started.py (HEAD vs. add-new-method)
Auto-merging getting_started.py
CONFLICT (content): Merge conflict in getting_started.py
Automatic merge failed; fix conflicts and then commit the result.
Uh oh, what happened? There are merge conflicts. This is because when we created the add-new-method
branch, the getting_started.py
file’s git history was one way, but when we merged edit-greet-method
into the main
branch, that history was updated. This means that add-new-method
is now out of sync with the main
branch, causing a merge conflict. There’s an easy way to fix this though, don’t worry.
What you’ll have to do is go into your IDE, or whatever you were writing your code with, and find where the conflicts are. If you’re using an IDE like vscode or PyCharm or anything like that, the changes will be highlighted very obviously. Otherwise, you’ll have to look out for something that says
<<<<<<< HEAD
return f'What\s up, {name}'
=======
return 'Hello, {name}'
def triple_this(number):
return 3 * number
>>>>>>> add-new-method
From here, you basically pick which versions you want to keep, either incoming, current, or both. You can do this by strategically deleting what you want to keep and what you want to throw out. It can get kind of tedious, but with an IDE, usually there’s a button you can press to accept one or the other.
In our case, we want to not only delete all the <<<<
type of things we don’t need, but also, we want to delete the old return method from the greet
function. In other words, your final script should look like:
def greet(name):
return f'What\s up, {name}!'
def triple_this(number):
return 3 * number
Once you’ve got that, go ahead and run your commit message, and check the history. Voila!
I should note that these merge conflicts only happened because we were using the same file. In real-world systems when two people are working on the same repository, it’s unlikely they’ll be working on the same file. Merge conflicts only occur when the same file with two different histories are trying to be committed, files whose histories haven’t changed from branch to branch will not throw conflicts like this.
Conclusion
This concludes part one of our professional git series. In part 2 we’re going to learn about publishing, cloning, and making pull requests on GitHub. Part 3 will cover some more advanced topics such as rebase and bisect. This is really good information that is very important for leveling up your coding skills, so stay tuned!