UQ Library 2024-09-16
We will use Git inside a command-line shell called Bash.
Installation instructions are available on this page.
If you need to collaborate on a project, a script, some code or a document, there are a few ways to operate. Sending a file back and forth and taking turns is not efficient; a cloud-based office suite requires a connection to the Internet and doesn’t usually keep a clean record of contributions.
Version control allows users to:
- record a clean history of changes;
- keep track of who did what;
- go back to previous versions;
- work offline; and
- resolve potential conflicts.
Programmers use version control systems to collaborativelly write code all the time, but it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can be stored in a version control system.
A version control system is a tool that keeps track of changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.
On a command line, Git commands are written as git verb
, where verb
is what we actually want to do.
Before we use Git, we need to configure it with some defaults, like our credentials and our favourite text editor. For example:
git config --global user.name "Vlad Dracula"
git config --global user.email "[email protected]"
This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to GitLab, GitHub, BitBucket or another Git host server in the future will include this information. This has to match your GitLab credentials.
git config --global core.editor "nano -w"
git config --list
You can always find help about git with the --help
flag:
git --help
git config --help
First, let’s make sure we’re in the right directory. We can check the
directory using the pwd
command, and then change directory using the
cd
command. On windows we can change to our default home directory
like so:
cd /c/Users/<yourusername>
We can using ls
to get a list of everything that is in our current
directory.
Now, let’s create a directory for our project and move into it:
mkdir planets
cd planets
This is the same as creating a new folder.
Then we tell Git to make planets
a repository — a place where Git can
store versions of our files:
git init
Using the ls
command won’t show anything new, but adding the -a
flag
will show the hidden files and directories too:
ls -a
Git created a hidden .git
directory to store information about the
project (i.e. everything inside the directory where the repository was
initiated).
Now that we’ve initialised the git repositry, we can start using commands to manage versions. We can now check the status of our project with:
git status
How do we record changes and make notes about them?
You should still be in the planets
directory, which you can check with
the pwd
command.
Let’s create a new text file that contains some notes about the Red
Planet’s suitability as a base. We’ll use the nano
text editor:
nano mars.txt
Type the following text into it:
Cold and dry, but everything is my favorite colour
Write out with Ctrl+O and exit nano with Ctrl+X.
We can now use ls
to check that the file has been created.
You can also check the contents of your new file with the cat
command:
cat mars.txt
Now, check the status of our project:
git status
Git noticed there is a new file. The “Untracked files” message means
that there’s a file in the directory that Git isn’t keeping track of. We
can tell Git to track a file using git add
:
git add mars.txt
You may get a note saying
warning: LF will be replaced by CRLF in mars.txt.
This is highlighting
the difference in the way that Linux systems and Windows systems handle
carriage returns. And this can be recorded as a change when you go
between operating systems, but only if you change those lines, and there
is now a lot more cross compatibility, so you can actually just safely
ignore this.
Now we can use git status
again to see what happenned:
git status
Git now knows that it’s supposed to keep track of mars.txt
, but it
hasn’t recorded these changes as a commit yet. To get it to do that, we
need to run one more command:
git commit -m "Start notes on Mars as a base"
When we run git commit
, Git takes everything we have told it to save
by using git add
and stores a copy permanently inside the special .git
directory. This permanent copy is called a commit (or revision)
and it is given a short identifier.
We use the -m
flag (for “message”) to record a short descriptive
comment that will help us remember what was done and why.
If we run git status
now:
git status
… we can see that the working tree is clean.
To see the recent history, we can use git log
:
git log
git log
lists all commits made to a repository in reverse
chronological order. The listing for each commit includes the commit’s
full identifier (which starts with the same characters as the short
identifier printed by the git commit
command earlier), the commit’s
author, when it was created, and the log message Git was given when the
commit was created.
Now, let’s add a line to our text file:
nano mars.txt
After writing out and saving, let’s check the status:
git status
We have changed this file, but we haven’t told Git we will want to save
those changes (which we do with git add
) nor have we saved them (which
we do with git commit
). So let’s do that now. It is good practice to
always review our changes before saving them. We do this using
git diff
. This shows us the differences between the current state of
the file and the most recently saved version:
git diff
There is a quite a bit of cryptic-looking information in there: it
contains the command used to compare the files, the names and
identifiers of the files, and finally the actual differences. The +
sign indicates which line was added.
It is now time to commit it:
git commit -m "<your comment>"
That didn’t work, because we forgot to use git add
first. Let’s fix
that:
git add mars.txt
git commit -m "<your comment>"
Using git add
allows us to select which changes are going to make it
into a commit, and which ones won’t. It sends them to what is called the
staging area. In a way, git add
specifies what will go in a
snapshot (putting things in the staging area), and git commit then
actually takes the snapshot.
Challenge 1
The staging area can hold changes from any number of files that you want to commit as a single snapshot.
- Add some text to mars.txt noting your decision to consider Venus as a base
- Create a new file venus.txt with your initial thoughts about Venus as a base for you and your friends
- Add changes from both files to the staging area, and commit those changes as one single commit.
Adding and committing multiple files:
How can we identify old versions of files, review changes and recover old versions?
As we saw in the previous lesson, we can refer to commits by their
identifiers. You can refer to the most recent commit of the working
directory by using the identifier HEAD
.
Let’s add a line to our file:
nano mars.txt
We can now check the difference with the head:
git diff HEAD mars.txt
Which is the same as using git diff mars.txt
. What is useful is that
we can refer to previous commits, for example for the commit before
HEAD
:
git diff HEAD~1 mars.txt
Similarly, git show
can help us find out what was changed in a
specific commit:
git show HEAD~2 mars.txt
We can also use the unique 7-character identifiers that were attributed to each commit:
git diff XXXXXXX mars.txt
How do we restore older versions of our file?
Overwrite your whole text with one single new line:
nano mars.txt
git diff
We can put things back the way they were by using git checkout
:
git checkout HEAD mars.txt
cat mars.txt
git checkout
checks out (i.e., restores) an old version of a file. In
this case, we’re telling Git that we want to recover the version of the
file recorded in HEAD
, which is the last saved commit. If we want to
go back even further, we can use a commit identifier instead:
git log -3
git checkout XXXXXXX mars.txt
cat mars.txt
git status
Notice that the changes are on the staged area. Again, we can put things back the way they were by using git checkout:
git checkout HEAD mars.txt
cat mars.txt
Challenge 2
Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning “broke” the script and it no longer runs. She has spent more than an hour trying to fix it, with no luck…
Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called data_cruncher.py?
git checkout HEAD
git checkout HEAD data_cruncher.py
git checkout HEAD~1 data_cruncher.py
git checkout <unique ID of last commit> data_cruncher.py
- Both 2 and 4
git config
: configure gitgit init
: initialise a git repository heregit status
: see information about current state of the repositorygit add
: add a change from a file (or several) to the staging areagit commit -m "..."
: commit a change (or several) to our historygit log
: see historygit show
: show changes in one commit for one filegit checkout
: roll back to previous versiongit diff
: difference between file on disk and commit in repository
How can I tell git to ignore things?
Sometimes, we don’t want git to track files like automatic backup files or intermediate files created during an analysis.
Say you create a bunch of .dat
files like so:
touch a.dat b.dat c.dat
git status
If you don’t want to track them, create a .gitignore
file:
nano .gitignore
… and add the following line to it:
*.dat
That will make sure no file finishing with .dat
will be tracked by
git.
git status
git add .gitignore
git commit -m "Ignore data files"
git status
If you haven’t already, now is the time to create a GitHub account. In our class, I’d ask you to share your username, so that we can collaborate later.
Git is the software. GitHub is a platform to allow you to host the repository and share it with others. There are others such as GitLab, BitBucket, GitTea, and GitBucket.
How do I share my changes with others on the web?
Version control really becomes extra useful when we begin to collaborate with other people. We already have most of the machinery we need to do this; the only thing missing is to copy changes from one repository to another.
It is easiest to use one copy as a central hub, stored online.
Let’s look at GitHub: https://github.com
Let’s share our repository with the world. Log into GitHub and
create a new repository called planets
(“+ > New repository” in the
top toolbar). Make sure you select “Public” for the visibility level.
Our local repository (on our computer) contains our recent work, but the remote repository on GitHub’s servers doesn’t.
We now need to connect the two: we do this by making the GitHub repository a remote for the local repository. The home page of the repository on GitHub includes the URL we need to identify it, under “…or push an existing repository from the command line”. Copy it to your clipboard, and in your local repository, run the following command (note that you will need to right click when using the Shell):
git remote add origin https://github.com/<your_username>/planets.git
The name origin
is a local nickname for your remote repository. We
could use something else if we wanted to, but origin
is by far the
most common choice.
GitHub wants to make sure that we’re using the same name for our main branch as they do on GitHub. You will note that our branch is currently called master, we can change the branch name to main with the next line of code:
git branch -M main
Now, we can push our changes from our local repository to the remote on GitHub. Try this:
git push
Git does not know where it should push by default. See the suggested command in the error message? We can set the default remote with a shorter version of that:
git push -u origin main
We only need to do that once: from now one, Git will know that the
default is the origin
remote and the main
branch.
It may request your credentials, which used to simply be your username
and password, however, GitHub now requires you to create a Personal
Access Token. 1. Click on your avatar in the top right of GitHub.com 1.
Click settings 1. Scroll to the bottom and on the left, and click
Developer settings. 1. Click Personal access tokens (either type is
fine) 1. Click Generate new token
(either is fine, classic is
simpler) 1. You may need to authenticate yourself using TFA (you can
also choose to use your password) 1. You can select what you need to be
able to edit. If you’ve chosen classic, you can skip this and scroll to
the bottom to Generate token
. 1. Make sure you copy this token
immediately and save it somewhere so you can reuse it.
You can now see on GitHub that your changes were pushed to the remote repository.
You can edit files directly on GitHub if you want. Try editing your READ.me by clicking on “Edit”.
If you do that, you will then need to pull changes from the remote repository to your local one before further editing:
git log
git pull
git log
ls
In summary: git push
sends commited changes to a remote repository,
whereas git pull
gets commited changes from the remote to your local
repository.
How do we use version control to collaborate?
Now, let’s get into pairs: one person is the “Owner”, the other is the “Collaborator”.
First, the Owner needs to give the collaborator editing access to the
repository. Go to the Settings tab in your GitHub repository. On the
left panel, you can click Collaborators and then Add people
. Here you
can enter usernames and email addresses.
The Collaborator can then accept the invitation.
Next, the Collaborator needs to download a copy of the Owner’s repository to their machine, which is called “cloning a repository”. To do that, first make sure you move out of your personal repository:
cd ..
Now, you can clone the Owner’s repository (you can do this by clicking
the green Code
button on a repository), and you can give it a
recognisable local name:
git clone https://github.com/<owner_username>/planets.git partner-planets
The Collaborator can now make changes in their clone of the Owner’s repository:
cd partner-planets
nano README.md
Add a section for your collaborators.
## Contributors
Name
Name2
Then add, commit, and push the change to the Owner’s repository on GitLab:
git add README.md
git commit -m "added collaborators"
git push
We didn’t have to create a remote called origin
, or set the default
upstream: that was done by default by Git when cloning the repository.
You can see that the changes are now live on GitHubb.
The Owner can now download the Collaborator’s changes from GitHub:
git pull origin main
If you collaborate on a remote repository, remember to pull
before
working!
Challenge 3
Switch roles and repeat the process!
Challenge 4
Use the GitLab interface to add a comment to your partner’s commit and suggest something. See your notifications in “Activity” afterwards.
What do I do when changes conflict with someone else’s?
As soon as people can work in parallel, they’ll likely step on each other’s toes. This will even happen with a single person: if we are working on a piece of software on both our laptop and a server in the lab, we could make different changes to each copy. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.
To see how we can resolve conflicts, we must first create one. The
file mars.txt is currently in the same state in both copies of the
planets
repository.
The Collaborator can add a line to their partner’s copy, and push to GitLab:
nano mars.txt
git add mars.txt
git commit -m "Add a line in my friend's file"
git push
Now let’s have the Owner make a different change to their own copy without pulling from GitHub beforehand:
nano mars.txt
The Owner can commit the change locally:
git add mars.txt
git commit -m "Add a line in my own copy"
But Git won’t let us push to GitHub:
git push
Git rejects the push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is (1) pull the changes from GitHub, (2) merge them into the copy we’re currently working in, and then (3) push that. Let’s start by pulling:
git pull
Git detects that changes made to the local copy overlap with those made to the remote repository, and therefore refuses to merge the two versions to stop us from trampling on our previous work. The conflict is marked in the affected file:
cat mars.txt
Our change is preceded by <<<<<<< HEAD
. Git has then inserted
=======
as a separator between the conflicting changes and marked the
end of the content downloaded from GitLab with >>>>>>>
. (The string of
letters and digits after that marker identifies the commit we’ve just
downloaded.)
It is now up to the Owner to fix this conflict:
nano mars.txt
They can now add and commit to their local repo, and then push the changes to GitHub:
git add mars.txt
git commit -m "Merge changes from GitHub"
git push
Git keeps track of merged files. The Collaborator can now pull the changes from GitHub:
git pull
git log -3
GitHub? GitLab? BitBucket?
External company, purchased domain and host, or local server at the lab?
This short course is based on the longer course Version Control with Git developped by the non-profit organisation The Carpentries. The original material is licensed under a Creative Commons Attribution license (CC-BY 4.0), and this modified version uses the same license. You are therefore free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
… as long as you give attribution, i.e. you give appropriate credit to the original author, and link to the license.