Skip to content

Introduction to Version Control

Motivation

  • Nothing that is committed to version control is ever lost, unless you work really, really hard at losing it. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.

  • As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor. Keeping a record of what was changed, when, and why is extremely useful; even if you are a lone researcher and need to come back to your own project e.g., a year later, when memory has faded.

Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.

Principles of Automated Version Control

Objectives

  • Understand the benefits of an automated version control system.
  • Understand the basics of how automated version control systems work.

Questions

  • What is version control and why should I use it?

Comic: a PhD student sends "FINAL.doc" to their supervisor, but after several increasingly intense and frustrating rounds of comments and revisions they end up with a file named "FINAL_rev.22.comments49.corrections.10.#@$%WHYDIDCOMETOGRADSCHOOL????.doc"

“notFinal.doc” by Jorge Cham, https://www.phdcomics.com

We've all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word's Track Changes, Google Docs' version history, or LibreOffice's Recording and Displaying Changes.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

A diagram demonstrating how a single document grows as the result of sequential changes

Once you think of changes as separate from the document itself, you can then think about "playing back" different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.

A diagram with one source document that has been modified in two different ways to produce two different versions of the document

Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.

A diagram that shows the merging of two different document versions into one document that contains all of the changes from both versions

A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

Your experience with editing code

Have you experienced a situation where you modified some code (e.g. a data analysis script) but later wanted to revert those changes and go back to an older version? How did you handle that?

Key points

  • Version control is like an unlimited ‘undo’.
  • Version control also allows many people to work in parallel.

Setup

Configuring Git

If you are not already logged in to CREATE HPC, log in now and change to the project directory you created earlier in your scratch space.

ssh k1234567@hpc.create.kcl.ac.uk
...
cd /scratch/users/k1234567

The version control tool we'll use is called Git. Git is already installed on the CREATE HPC, but requires some configuration before using it for the first time. Below are a few examples of configurations we will set as we get started with Git:

  • our name and email address,
  • what our preferred text editor is,
  • and that we want to use these settings globally (i.e. for every project).

On a command line, Git commands are written as git verb options, where verb is what we actually want to do and options is additional optional information which may be needed for the verb.

So here is how you might configure Git on the HPC (replace the example name and email with your name and King's email address):

$ git config --global user.name "Your FullName"
$ git config --global user.email "k1234567@kcl.ac.uk"

This user name and email will be associated with your subsequent Git activity.

Tip

Any changes pushed to GitHub, BitBucket, GitLab or another Git host server after this lesson will include the user information you have provided when configuring Git.

If you want to use GitHub.com during this workshop, the email address used should be the same as the one used when setting up your GitHub.com account. If you are concerned about privacy, please review GitHub’s instructions for keeping your email address private.

You can modify your Git configuration at any time, if you want to change something. Additionally, you will need to repeat these configuration steps on each new machine you use.

You should also set your favourite text editor. If you're not sure what to choose, we recommend nano.

$ git config --global core.editor "nano -w"

Git Help and Manual

Always remember that if you forget the subcommands or options of a git command, you can access the relevant list of options by typing git <command> -h or access the corresponding Git manual by typing git <command> --help, e.g.:

$ git config -h
$ git config --help

While viewing the manual, remember the : is a prompt waiting for commands and you can press Q to exit the manual.

More generally, you can get the list of available git commands and further resources of the Git manual typing:

$ git help