As I was looking for easy assignments for the Open Source Development Course I found something very troubling which is also an opportunity for a lot of teaching and a lot of practice.
Some files don't need to be in git
The common sense dictates that we rarely need to include generated files in our git repository. There is no point in keeping them in our version control as they can be generated again. (The exception might be if the generation takes a lot of time or can be done only during certain phases of the moon.)
Neither is there a need to store 3rd party libraries in our git repository. Instead of that we store a list of our dependencies with the required version and then we download and install them. (Well, the rightfully paranoid might download and save a copy of every 3rd party library they use to ensure it can never disappear, but you'll see we are not talking about that).
.gitignore
The way to make sure that neither we nor anyone else adds these files to the git repository by mistake is to create a file called .gitignore
, include patterns that match the files we would like to exclude from git and add the .gitignore
file to our repository. git will ignore those file. They won't even show up when you run git status
.
The format of the .gitignore
file is described in the documentation of .gitignore.
In a nutshell:
/output.txt
Ignore the output.txt
file in the root of the project.
output.txt
Ignore output.txt
anywhere in the project. (in the root or any subdirectory)
*.txt
Ignore all the files with .txt
extension
venv
Ignore the venv
folder anywhere in the project.
There are more. Check the documentation of .gitignore!
Not knowing about .gitignore
Apparently a lot of people using git and GitHub don't know about .gitignore
The evidence:
Python developers use something called virtualenv
to make it easy to use different dependencies in different projects. When they create a virtualenv
they usually configure it to install all the 3rd party libraries in a folder called venv
. This folder we should not include in git. And yet:
There are 452M hits for this search venv
In a similar way NodeJS developers install their dependencies in a folder called node_modules
. There are 2B responses for this search: node_modules
Finally, if you use the Finder
applications on macOS and open a folder, it will create an empty(!) file called .DS_Store
. This file is really not needed anywhere. And yet I saw many copies of it on GitHub. Unfortunately so far I could not figure out how to search for them. The closest I found is this search.
Misunderstanding .gitignore
There are also many people who misunderstand the way .gitignore works. I can understand it as the wording of the explanation is a bit ambiguous. What we usually say is that
If you'd like to make sure that git will ignore the
__pycache__
folder then you need to put it in.gitignore
.
A better way would be to say this:
If you'd like to make sure that git will ignore the
__pycache__
folder then you need to put its name in the.gitignore
file.
Without that people might end up creating a folder called .gitignore
and moving all the __pycache__
folder to this .gitignore
folder. You can see it in this search
Help
Can you suggest other common cases of unnecessary files in git that should be ignored?
Can you help me creating the search for .DS_store
in GitHub?
Updates
More based on the comments:
-
.o
files the result of compilation of C and C++ code: .o -
.class
files the result of compilation of Java code: .class -
.pyc
files are compiled Python code. Usually stored in the__pycache__
folder mentioned earlier: .pyc
How to create a .gitignore file?
A follow-up post: