Tracking Changes¶

Objectives

Go through the modify-add-commit cycle for one or more files.
Explain where information is stored at each stage of that cycle.
Distinguish between descriptive and non-descriptive commit messages.

Questions

How do I record changes in Git?
How do I check the status of my version control repository?
How do I record notes about what changes I made and why?

Adding a File¶

First let's make sure we're still in the right directory. You should be in the git repository we just cloned.

$ pwd

Let's copy over the README file from our example project:

$ cp /rds/prj/rds_hpc_training/<k-number>/example-project/README .

Let’s verify that the file was properly copied by running the list command (ls):

$ ls

README

We can use the editor nano to view the contents of README:

$ nano README

Other text editors

You can use whatever editor you like. In particular, this does not have to be the core.editor you set globally earlier. But remember, the steps to create or edit a new file will depend on the editor you choose (it might not be nano). For a refresher on text editors, check out "Which Editor?" in the Software Carpentries lesson on the Unix Shell.

If we check the status of our project again, Git tells us that it's noticed the new file:

$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        README

nothing added to commit but untracked files present (use "git add" to track)

The "untracked files" message means that there's a file in the directory that Git isn't keeping track of. We can tell Git to track a file using git add:

$ git add README

and then check that the right thing happened:

$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   README

Git now knows that it's supposed to keep track of README, but it hasn't recorded these changes as a commit yet. To get it to do that, we need to run one more command:

$ git commit -m "Add initial version of README"

[main 18898b0] Add initial version of README
 1 file changed, 4 insertions(+)
 create mode 100644 README

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and its short identifier is 18898b0. Your commit may have another identifier.

We use the -m flag (for "message") to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch nano (or whatever other editor we configured as core.editor) so that we can write a longer message.

Good commit messages start with a brief (<50 characters) statement about the changes made in the commit. Generally, the message should complete the sentence "If applied, this commit will …". If you want to go into more detail, add a blank line between the summary line and your additional notes. Use this additional space to explain why you made changes and/or what their impact will be.

If we run git status now:

$ git status

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

it tells us we have no more changes remaining.

However, it also tells us that we have not yet published the latest commit to origin/main. And indeed, if we look at the repository on GitHub, the README file is not visible yet. To push our local commits back to the server, we need to run

git push origin

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 32 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 450 bytes | 450.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To github.kcl.ac.uk:k1234567/git-training.git
   c9ef82f..18898b0  main -> main

and we can reload the web page to verify that the latest commit has appeared.

Making changes in the web interface

The GitHub web interface allows you to create new commits by uploading files to the repository or making simple edits to existing files. If you make changes there, you can run

$ git pull origin

to pull those commits into your local copy.

However, if you are editing the same repository on both CREATE and the GitHub web interface, you might accidentally make different changes to each copy. If this happens, git will require you to resolve these conflicts.

If we want to know what we've done recently, we can ask Git to show us the project's history using git log:

$ git log

commit 18898b0ee998e04b1ff89519033b525fe1dd9eb4 (HEAD -> main, origin/main, origin/HEAD)
Author: Jost Migenda
Date:   Mon May 19 16:13:45 2025 +0100

    Add initial version of README

commit c9ef82f3ccb4f6c08000fa743570e870185e24c8
Author: Jost Migenda
Date:   Mon May 19 15:56:10 2025 +0100

    Initial commit

git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit's full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit's author, when it was created, and the log message Git was given when the commit was created.

Where Are My Changes?

If we run ls at this point, we will still see just one file called README. That's because Git saves information about files' history in the special .git directory mentioned earlier so that our filesystem doesn't become cluttered (and so that we can't accidentally edit or delete an old version).

Changing a File¶

Now suppose we want to make changes to the README file. We again use our favourite text editor:

$ nano README

When we run git status afterwards, it tells us that a file it already knows about has been modified:

$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README

no changes added to commit (use "git add" and/or "git commit -a")

The last line is the key phrase: "no changes added to commit". We have changed this file, but we haven't told Git we will want to save those changes (which we do with git add) nor have we saved them (which we do with git commit). So let's do that now.

It is good practice to always review our changes before saving them. We do this using git diff. This shows us the differences between the current state of the file and the most recently saved version:

$ git diff

diff --git a/README b/README
index 7b1d7d6..3c196e8 100644
--- a/README
+++ b/README
@@ -2,3 +2,5 @@ This is an image analysis project. The aim is to identify blobs in the images st
 First run the "process_images" script and then the "analyse_images" script. 

 N.B. you will need to install the "imagecodecs" package.
+
+This is an added line!

The output is cryptic because it is actually a series of commands for tools like editors and patch telling them how to reconstruct one file given the other. If we break it down into pieces:

The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
The second line tells exactly which versions of the file Git is comparing; 7b1d7d6 and 3c196e8 are unique computer-generated labels for those versions.
The third and fourth lines once again show the name of the file being changed.
The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

After reviewing our change, it's time to commit it:

$ git commit -m "add details to README"

On branch main
Your branch is up-to-date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README

no changes added to commit (use "git add" and/or "git commit -a")

Whoops: Git won't commit because we didn't use git add first. Let's fix that:

$ git add README
$ git commit -m "add details to README"

[main 5921689] add details to README
 1 file changed, 2 insertions(+)

Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we're adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we haven't finished yet).

To allow for this, Git has a special staging area where it keeps track of things that have been added to the current changeset but not yet committed.

Think Before You Commit!

If you think of Git as taking snapshots of changes over the life of a project, git add specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot, and makes a permanent record of it (as a commit). If you don't have anything staged when you type git commit, Git will prompt you to use git commit -a or git commit --all. However, it's almost always better to explicitly add things to the staging area, since otherwise you might commit changes you forgot you made.

For example, while you’re developing a script you might have it produce output plots in the current directory. And while those are useful for debugging, you probably don’t want to commit them to the repository. If you make a habit of using git commit --all, it’s easy to accidentally commit files like these.

If you frequently have files like these, an easy way to avoid accidents is to tell git to ignore them, by listing it in a special file called .gitignore. In fact, when we created the repository, we asked GitHub to add a .gitignore file from the start; so let’s take a look at that now.

$ nano .gitignore

A diagram showing how "git add" registers changes in the staging area, while "git commit" moves changes from the staging area to the repository

Let's watch as our changes to a file move from our editor to the staging area and into long-term storage. First, we'll make another small change to our README:

$ nano README
$ git diff

diff --git a/README b/README
index 3c196e8..423cdd4 100644
--- a/README
+++ b/README
@@ -3,4 +3,4 @@ First run the "process_images" script and then the "analyse_images" script.

 N.B. you will need to install the "imagecodecs" package.

-This is an added line!
+This is a modified line!

So far, so good: we've replaced one line (shown with a - in the first column) with a new line (shown with a + in the first column). Now let's put that change in the staging area and see what git diff reports:

$ git add README
$ git diff

There is no output: as far as Git can tell, there's no difference between what it's been asked to save permanently and what's currently in the directory. However, if we do this:

$ git diff --staged

diff --git a/README b/README
index 3c196e8..423cdd4 100644
--- a/README
+++ b/README
@@ -3,4 +3,4 @@ First run the "process_images" script and then the "analyse_images" script.

 N.B. you will need to install the "imagecodecs" package.

-This is an added line!
+This is a modified line!

it shows us the difference between the last committed change and what's in the staging area.

Word-based diffing

Sometimes, e.g. in the case of the text documents a line-wise diff is too coarse. That is where the --color-words option of git diff comes in very useful as it highlights the changed words using colors.

Let's save our changes:

$ git commit -m "Modify README"

[main 5cc6641] Modify README
 1 file changed, 1 insertion(+), 1 deletion(-)

check our status:

$ git status

On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

and look at the history of what we've done so far:

$ git log

commit 5cc664186d6ba95a2476c1e5c30c17e01d73c751 (HEAD -> main)
Author: Jost Migenda
Date:   Mon May 19 16:56:20 2025 +0100

    Modify README

commit 5921689e37c8ff667c7a84ab6e8feb652237ad14
Author: Jost Migenda
Date:   Mon May 19 16:47:14 2025 +0100

    add details to README

commit 18898b0ee998e04b1ff89519033b525fe1dd9eb4 (origin/main, origin/HEAD)
Author: Jost Migenda
Date:   Mon May 19 16:13:45 2025 +0100

    Add initial version of README

commit c9ef82f3ccb4f6c08000fa743570e870185e24c8
Author: Jost Migenda
Date:   Mon May 19 15:56:10 2025 +0100

    Initial commit

To recap, when we want to add changes to our repository, we first need to add the changed files to the staging area (git add) and then commit the staged changes to the repository (git commit):

A diagram showing two documents being separately staged using git add, before being combined into one commit using git commit

Tip

We’re at the end of a subsection, so this is a good opportunity to push our changes to the remote repository:

$ git push origin

Choosing a Commit Message

Which of the following commit messages would be most appropriate?

"Changes"
"change README to say that running import antigravity to import this package will probably only work on your local machine, not on CREATE"
"improve usage instructions"

Solution

Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using "git diff" to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative.

Committing Changes to Git

Which command(s) below would save the changes of myfile.txt to my local Git repository?

# Option 1
$ git commit -m "my recent changes"

# Option 2
$ git add myfile.txt
$ git commit -m "my recent changes"

# Option 3
$ git commit -m myfile.txt "my recent changes"

Solution

Would only create a commit if files have already been staged.
Is correct: first add the file to the staging area, then commit.
Would try to commit a file "my recent changes" with the message myfile.txt.

Dealing with Multiple Files in One Commit¶

So far, we’ve only dealt with changes to a single file. As a result, adding changes to the staging area before every commit probably felt like unnecessary overhead to you.

However, the staging area can hold changes from any number of files that you want to commit as a single snapshot. This lets us structure our work better, with one commit corresponding to one logical chunk of the work. Let’s add the scripts from the example project to our repository and see how the staging area helps us with that.

First, we’ll create a subdirectory for the scripts:

$ mkdir scripts

Git does not track directories on their own, only files within them:

$ git status
$ git add scripts
$ git status

Note that our newly created empty directory scripts does not appear in the list of untracked files even if we explicitly add it (via git add) to our repository. This is the reason why you will sometimes see .gitkeep files in otherwise empty directories. The sole purpose of .gitkeep files is to populate a directory so that Git adds it to the repository. The name .gitkeep is just a convention, and in fact, you can name these files anything you like.

However, once we add files to the directory, we can add all the files in the directory at once by referring to the directory in your git add command:

$ cp /scratch/users/<k-number>/example-project/scripts/* scripts/
$ git status
$ git add scripts
$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   scripts/analyse_images Jan 2025.py
        new file:   scripts/analyse_images.sh
        new file:   scripts/analyse_images_NEW.py
        new file:   scripts/counting_objects.ipynb
        new file:   scripts/process_images.py
        new file:   scripts/process_images.sh

Notice that we have two versions of analyse_images.py here. Do we really want to add both of them to the repository? No; the whole point of using version control is that we don’t have to manually deal with multiple copies of a file!

Now, how can we figure out what the difference between them is?

We’ve previously seen that git diff can show us changes we’ve made to a file. There is also a standalone command called diff, which we can use to figure out difference between two files.

$ diff --color scripts/analyse_images\ Jan\ 2025.py scripts/analyse_images_NEW.py

16,17c16,17
<     # Gaussian blur (sigma=3)
<     im_gauss = filters.gaussian(processed_image, sigma=3)
---
>     # Gaussian blur (sigma=5)
>     im_gauss = filters.gaussian(processed_image, sigma=5)
19,21c19,21
<     # segment image using mean
<     thresh = filters.threshold_mean(processed_image)
<     im_thresh = processed_image >= thresh
---
>     # segment image using Otsu method
>     thresh = filters.threshold_otsu(im_gauss)
>     im_thresh = im_gauss >= thresh

diff tells us, what changes we would need to make to turn the first file (analyse_images\ Jan\ 2025.py) into the second file (analyse_images_NEW.py).

In this case, let’s say we remember from a previous group meeting that the “Otsu method” is the latest one. So we can delete the first file:

$ rm scripts/analyse_images\ Jan\ 2025.py
$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   scripts/analyse_images Jan 2025.py
        new file:   scripts/analyse_images.sh
        new file:   scripts/analyse_images_NEW.py
        new file:   scripts/counting_objects.ipynb
        new file:   scripts/process_images.py
        new file:   scripts/process_images.sh

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    scripts/analyse_images Jan 2025.py

Hm … we still have that file in the staging area; but now git tells us that there’s a new change (the deletion of the file), that’s not been added to the staging area. Note that git helpfully tells us what we need to do to remove the file from git as well:

$ git rm scripts/analyse_images\ Jan\ 2025.py
$ git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   scripts/analyse_images.sh
        new file:   scripts/analyse_images_NEW.py
        new file:   scripts/counting_objects.ipynb
        new file:   scripts/process_images.py
        new file:   scripts/process_images.sh

Now this looks much better! However, we should still fix that filename. Since we use version control, that _NEW suffix really shouldn’t be there any more. Of course, we could use the mv shell command to move the file to the new name and then git add this change to the staging area. But just like git diff or git rm, git also has special integration with the shell command for moving files, so we can do this more easily:

git mv scripts/analyse_images_NEW.py scripts/analyse_images.py
git status

On branch main
Your branch is up-to-date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   scripts/analyse_images.py
        new file:   scripts/analyse_images.sh
        new file:   scripts/counting_objects.ipynb
        new file:   scripts/process_images.py
        new file:   scripts/process_images.sh

This looks just like we want it!

But before we commit, let’s mention these analysis scripts in the README as well:

nano README
git add README
git status

Finally, we will commit these changes and push everything we’ve done recently to the remote repository:

git commit -m "add analysis scripts"
git push origin

What’s a commit?

Which of these should be a single commit?

Fixing a typo
Adding a new plot to an analysis script
Modifying a script to turn some repeated code into a function
Rewriting an analysis to use a different approach, over multiple days

Answers

Usually no, fixing a typo should not be a commit by itself but should be combined with other related work, if possible.
and 3. Usually yes, this type of change should be a commit.
Usually no, a complete rewrite should be split up into smaller chunks.

A good guideline is that you should commit every time you make a change you don't want to lose, or a change that you might want to reverse in the future. If you realise there's an error in your code and want to go back and see when it was introduced, smaller commits (i.e. with fewer changes) can make it easier to track down the source of the problem. In the next section, we'll look at how to examine the commit history.

Key points

git status shows the status of a repository.
Files can be stored in a project's working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
git add puts files in the staging area.
git commit saves the staged content as a new commit in the local repository.
Write a commit message that accurately describes your changes.