Exercises¶
This section contains collection of exercises based on the material covered in the previous sections.
Sample answers are provided in-line in the expandable boxes, but please try to do exercises by yourself before looking at the sample answers. There are also some hints to help you if you are stuck.
Supporting files can be found on CREATE HPC in /datasets/hpc_training
directory. The
directory also contains example scripts from the exercises for convenience.
Info
There might be multiple ways to achieve a specific goal - If your method does not match the sample answer it might not mean you have done things incorrectly, just differently.
Using modules¶
-
Basic software environment module usage¶
Goal: Load and use the Python environment module with the most recent version of Python
- Investigate the module environment before and after loading the module. What has changed?
- Print python version
- Check the python interpreter location on the filesystem
- Unload the module and check the python version and location again. What is different?
Hint
Use
which
command to find the location and-V
option to print python version.Sample answer
First print the list of the modules before the module has been loaded
k1234567@erc-hpc-login2:~$ module list
Next load the Python module - N.B. the most recent version you have available may be different to this example answer.
k1234567@erc-hpc-login2:~$ module load python/3.8.12-gcc-9.4.0
List the currently loaded modules:
k1234567@erc-hpc-login2:~$ module list Currently Loaded Modules: 1) bzip2/1.0.8-gcc-9.4.0 6) libiconv/1.16-gcc-9.4.0 11) ncurses/6.2-gcc-9.4.0 16) xz/5.2.5-gcc-9.4.0 2) libmd/1.0.3-gcc-9.4.0 7) libxml2/2.9.12-gcc-9.4.0 12) openssl/1.1.1l-gcc-9.4.0 17) zlib/1.2.11-gcc-9.4.0 3) libbsd/0.11.3-gcc-9.4.0 8) tar/1.34-gcc-9.4.0 13) readline/8.1-gcc-9.4.0 18) python/3.8.12-gcc-9.4.0 4) expat/2.4.1-gcc-9.4.0 9) gettext/0.21-gcc-9.4.0 14) sqlite/3.36.0-gcc-9.4.0 5) gdbm/1.19-gcc-9.4.0 10) libffi/3.3-gcc-9.4.0 15) util-linux-uuid/2.36.2-gcc-9.4.0
You can see that the python module and its dependencies have been added.
Test by printing python version:
k1234567@erc-hpc-login2:~$ python -V Python 3.8.12
You can also see where the python interpreter resides:
k1234567@erc-hpc-login2:~$ which python /software/spackages_prod/apps/linux-ubuntu20.04-zen2/gcc-9.4.0/python-3.8.12-mdneme5mnx2ihlvy6ihbvjatdvnn45l6/bin/python
Unload the python module using:
k1234567@erc-hpc-login2:~$ module rm python/3.8.12-gcc-9.4.0
Check the version and location again:
k1234567@erc-hpc-login2:~$ python -V Python 3.8.10 k1234567@erc-hpc-login2:~$ which python /usr/bin/python
Now we are using system provided python. Although we said that no software is automatically pre-loaded, there will be applications that come with the operating system and cannot be removed/hidden easily.
-
Further environment module usage¶
Goal: Explore loading different versions of the same environment
- Load the most recent version of Python and then attempt to load a different version. What is the outcome?
Sample answer
Start by loading python 3.8.12 (N.B. the versions you have available may be different to this example answer):
k1234567@erc-hpc-login2:~$ module load python/3.8.12-gcc-9.4.0
Next load python 2.7.18:
k1234567@erc-hpc-login2:~$ module load python/2.7.18-gcc-9.4.0
You should see the following message informing you about the module swap:
The following have been reloaded with a version change: 1) python/3.8.12-gcc-9.4.0 => python/2.7.18-gcc-9.4.0
Python virtual environments¶
-
Python virtualenv creation¶
Goal: Create and use a Python virtual environment
- Create python virtualenv using the most recent available version of python and activate it.
- Find out where the python interpreter and pip utility are located
- Deactivate the python virtual environment
- Check the location of python interpreter and pip utility again
Sample answer
Start by loading the python interpreter using:
k1234567@erc-hpc-login2:~$ module load python/3.11.6-gcc-13.2.0
Once the module has been loaded create the python virtual environment:
k1234567@erc-hpc-login2:~$ python -m venv myvenv
Next activate the environment:
k1234567@erc-hpc-login2:~$ source myvenv/bin/activate (myvenv) k1234567@erc-hpc-login2:~$
Use
which
command to print the locations of the python interpreter and pip utility(myvenv) k1234567@erc-hpc-login2:~$ which python /users/k1234567/myvenv/bin/python (myvenv) k1234567@erc-hpc-login2:~$ which pip /users/k1234567/myvenv/bin/pip
Note: The above paths might vary depending on where on the filesystem you have created your virtual environemnt.
Next deactivate the virtual environment
(myvenv) k1234567@erc-hpc-login2:~$ deactivate
Check the location of python interpreter and pip using
k1234567@erc-hpc-login2:~$ which python /software/spackages_v0_21_prod/apps/linux-ubuntu22.04-zen2/gcc-13.2.0/python-3.11.6-oe7bpykqsieymznu5rprjla46ti6uagh/bin/python k1234567@erc-hpc-login2:~$ which pip /usr/bin/pip
-
Python virtualenv package installation¶
Goal: Install packages into a python virtual environment
- Using the virtualenv created in the previous excercise, install the
pandas
package to it. Check if it has been installed. - What other packages were installed as dependencies?
Hint
- Use
pip install package_name
to install a package in your virtual environment - Use
pip list
to list installed packages
Sample answer
Using the previously created environment activate it
k1234567@erc-hpc-login2:~$ source myvenv/bin/activate (myvenv) k1234567@erc-hpc-login2:~$
Next use pip command to install the relevant package
(myvenv) k1234567@erc-hpc-login2:~$ pip install pandas Collecting pandas Downloading pandas-1.4.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 14.4 MB/s eta 0:00:00 Collecting python-dateutil>=2.8.1 Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 1.8 MB/s eta 0:00:00 Collecting pytz>=2020.1 Downloading pytz-2022.1-py2.py3-none-any.whl (503 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 503.5/503.5 KB 1.7 MB/s eta 0:00:00 Collecting numpy>=1.18.5 Downloading numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.8/16.8 MB 13.6 MB/s eta 0:00:00 Collecting six>=1.5 Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) Installing collected packages: pytz, six, numpy, python-dateutil, pandas Successfully installed numpy-1.22.3 pandas-1.4.2 python-dateutil-2.8.2 pytz-2022.1 six-1.16.0
Additional dependencies that pandas package requires will also be installed.
You can check if the package has been installed using
(myvenv) k1234567@erc-hpc-login2:~$ pip list Package Version --------------- ------- numpy 1.22.3 pandas 1.4.2 pip 22.0.4 python-dateutil 2.8.2 pytz 2022.1 setuptools 62.1.0 six 1.16.0 wheel 0.37.1
Note: The version numbers you see might not match those above.
- Using the virtualenv created in the previous excercise, install the
Job submission (part 1)¶
-
SLURM batch job submission¶
Goal: Submit a batch job requesting 1 core and 2G of memory
- Have the job print the hostname of the node that it runs on
- Check if the job is running
- Analyse the output file
- Check information about completed job
Hint
- Use
hostname
command to print the hostname of the node - Add
sleep
to the jobscript so that it does not terminate too quickly - Use
squeue
to check the status of the job - Use
sacct
to get the information about the job when it has finished
Sample answer
Create a sample script
test-job1.sh
and add the following contents to it#!/bin/bash -l #SBATCH --job-name=test-job1 #SBATCH --partition=cpu #SBATCH --reservation=cpu_introduction #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=2G echo "My hostname is "`hostname` sleep 60
Submit the jobs using
k1234567@erc-hpc-login2:~$ sbatch test-job1.sh Submitted batch job 56739
You can check the status of your job(s)
k1234567@erc-hpc-login2:~$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 56739 cpu test-job k1234567 R 0:10 1 erc-hpc-comp001
Analyse the output, which should be located in the
slurm-jobid.out
file (replacejobid
with the id of your job)k1234567@erc-hpc-login2:~$ cat slurm-56739.out My hostname is erc-hpc-comp001
You can check information about completed job using
k1234567@erc-hpc-login2:~$ sacct -j 56739 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 56739 test-job1 cpu kcl 1 COMPLETED 0:0 56739.batch batch kcl 1 COMPLETED 0:0
-
SLURM interactive job submission¶
Goal: Request an interactive session with one core and 4G of memory
- Print the hostname of the interactive node that you have been allocated
- Print number of cores available
- What happens immediately after requesting the interactive job? How is it different from submitting a batch job? Why might this be a bad thing?
Hint
- Use
hostname
command to print the hostname of the node - Use
nproc
utility to print the number of allocated cpus for the job
Sample answer
Use
srun
command to request an interactive sessionk1234567@erc-hpc-login2:~$ srun -p cpu --reservation cpu_introduction --mem 4G --pty /bin/bash -l srun: job 56741 queued and waiting for resources srun: job 56741 has been allocated resources k1234567@erc-hpc-comp001:~$
Print the hostname of the allocated node and number of cores that we have been allocated
k1234567@erc-hpc-comp001:~$ hostname erc-hpc-comp001 k1234567@erc-hpc-comp001:~$ nproc 1
-
GPU SLURM jobs¶
Goal: Submit a job requesting gpu(s) on a single node.
- Write a job script that requests a single gpu and does something that reports the information in its output
- Submit a job that requests two gpus and check the output
Hint
- Use
nvidia-smi --id=$CUDA_VISIBLE_DEVICES
utility to print the details of allocated gpu(s) for the job
Sample answer
Create a sample script
test-gpu.sh
and add the following contents to it:#!/bin/bash -l #SBATCH --job-name=test-gpu #SBATCH --partition=interruptible_gpu #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --gres gpu:1 nvidia-smi --id=$CUDA_VISIBLE_DEVICES
Submit the jobs using
k1234567@erc-hpc-login2:~$ sbatch test-gpu.sh Submitted batch job 57425
Analyse the output, which should be located in the
slurm-jobid.out
file (replacejobid
with the id of your job)k1234567@erc-hpc-login2:~$ cat slurm-57425.out Tue May 10 09:23:12 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 1 NVIDIA A100-SXM... On | 00000000:31:00.0 Off | 0 | | N/A 42C P0 51W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Next modify
--gres
option to be--gres gpu:2
and re-submit the job.Analyse the output, which should be located in the
slurm-jobid.out
file (replacejobid
with the id of your job)k1234567@erc-hpc-login2:~$ cat slurm-57460.out Tue May 10 09:32:21 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 1 NVIDIA A100-SXM... Off | 00000000:31:00.0 Off | 0 | | N/A 35C P0 55W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA A100-SXM... Off | 00000000:B1:00.0 Off | 0 | | N/A 34C P0 55W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
Debugging SLURM job scripts¶
Goal: Analyse and fix a SLURM batch job script
- For this exercise, use the
bad-script1.sh
script located in thesample-scripts
folder in the supporting files directory - Examine the job script, what is it intended to do?
- Submit it as a batch job. What is the problem? Try to fix the issue.
- How else could you have fixed the issue?
Hint
You will have to copy the script to your local workspace in order to modify it.
Sample Answer
Submit the
/datasets/hpc_training/sample-scripts/sample-scripts/bad-script1.sh
jobk1234567@erc-hpc-login2:~$ sbatch /datasets/hpc_training/sample-scripts/sample-scripts/bad-script1.sh
Once the job has finished analyse its output file
k1234567@erc-hpc-login2:~$ cat slurm-56749.out slurmstepd-erc-hpc-comp001: error: *** JOB 56749 ON erc-hpc-comp001 CANCELLED AT 2022-05-09T14:41:14 DUE TO TIME LIMIT ***
and the accounting information
k1234567@erc-hpc-login2:~$ sacct -j 56749 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 56749 bad-scrip+ cpu kcl 1 TIMEOUT 0:0 56749.batch batch kcl 1 CANCELLED 0:15
You can see that the job was killed because insufficient runtime was requested. Fix it by changing the runtime to 3 minutes which should conver the 2 minute sleep (You can alternatively decrease the sleep time).
#SBATCH -t 0-0:03 # time (D-HH:MM)
Once the amended script is submitted the output file should contain the following
k1234567@erc-hpc-login2:~$ cat slurm-56751.out Hello World! erc-hpc-comp001
and the accouting information should contain
k1234567@erc-hpc-login2:~$ sacct -j 56751 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 56751 bad-scrip+ cpu kcl 1 COMPLETED 0:0 56751.batch batch kcl 1 COMPLETED 0:0
- For this exercise, use the
Job submission (part 2)¶
-
Multi-core SLURM jobs¶
Goal: Submit a job requesting multiple cores on a single node.
- Write a job script that requests two cores and does something that reports the information in its output
- Submit the job and check whether the correct resources have been allocated to the job by SLURM. Does it match the output from your job?
- How else could you have achieved the same resource request?
Hint
- Use
nproc
utility to print the number of allocated cpus for the job - Use
sacct
to check the resource allocation after the job has finished
Sample answer
Create a sample script
test-multicore.sh
and add the following contents to it:#!/bin/bash -l #SBATCH --job-name=test-multicore #SBATCH --partition=cpu #SBATCH --reservation=cpu_introduction #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 echo "I have "`nproc`" cpus."
Submit the jobs using:
k1234567@erc-hpc-login2:~$ sbatch test-multicore.sh Submitted batch job 56758
Analyse the output, which should be located in the
slurm-jobid.out
file (replacejobid
with the id of your job)k1234567@erc-hpc-login2:~$ cat slurm-56758.out I have 2 cpus.
You can check the information about completed job using
k1234567@erc-hpc-login2:~$ sacct -j 56758 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 56758 test-mult+ cpu kcl 2 COMPLETED 0:0 56758.batch batch kcl 2 COMPLETED 0:0
You should see the
AllocCPUS
with value of 2. -
SLURM array jobs¶
Goal: Submit an array job consisting of 3 tasks
- Write a job script that defines an array of 3 tasks
- Have your job print out the SLURM array task ID number in its output
- Submit the script and then immediately check the status of the running job. How many individual jobs are running?
- How do you identify array jobs and tasks in the queue displayed by SLURM?
- Investigate the output files - how does an array job differ from the other jobs that you have been submitting?
Hint
- Use
sleep
command to delay the execution of the script so that you can check its status - Use
squeue
command to check the status of the running job. - Use
SLURM_ARRAY_TASK_ID
environment variable inside the job to get the task id associated witht he task.
Sample answer
Create a sample script
test-array.sh
and add the following contents to it:#!/bin/bash -l #SBATCH --job-name=test-array #SBATCH --partition=cpu #SBATCH --reservation=cpu_introduction #SBATCH --ntasks=1 #SBATCH --array=1-3 sleep 60 echo "My task id is ${SLURM_ARRAY_TASK_ID}"
Submit the jobs using
k1234567@erc-hpc-login2:~$ sbatch test-array.sh Submitted batch job 56759
Check the status of the job using:
k1234567@erc-hpc-login2:~$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 56759_1 cpu test-arr k1234567 R 0:12 1 erc-hpc-comp001 56759_2 cpu test-arr k1234567 R 0:12 1 erc-hpc-comp001 56759_3 cpu test-arr k1234567 R 0:12 1 erc-hpc-comp001
You may see separate entries corresponding to each of the tasks in your array job.
Analyse the output, which should be located in the
slurm-jobid_X.out
files (replacejobid
with the id of your job).X
will be a number corresponding to your tasks range, so you should have 3 separate log files, one for each array task.k1234567@erc-hpc-login2:~$ cat slurm-56759_1.out My task id is 1 k1234567@erc-hpc-login2:~$ cat slurm-56759_2.out My task id is 2 k1234567@erc-hpc-login2:~$ cat slurm-56759_3.out My task id is 3
-
Organising input for SLURM array jobs¶
Goal: Define an array job where each task takes its input from a different file
- Write an array job script where each task will print the contents of the files located in the
sample-files/arrayjob
folder in the supporting files directory - Each task should only print the contents of one file, no two tasks should print the same file
- Submit the job, confirm the contents of the array tasks output files match the input file contents as intended
Hint
- Try to find the association between the task id and the filename
- Use
cat
to print out the contents of the file
Sample answer
Create a sample script
test-array-files.sh
and add the following contents to it#!/bin/bash -l #SBATCH --job-name=test-array #SBATCH --partition=cpu #SBATCH --reservation=cpu_introduction #SBATCH --ntasks=1 #SBATCH --array=1-3 cat /datasets/hpc_training/sample-files/arrayjob/file${SLURM_ARRAY_TASK_ID}.txt
Submit the jobs using
k1234567@erc-hpc-login2:~$ sbatch test-array-files.sh Submitted batch job 56769
Analyse the output, which should be located in the
slurm-jobid_X.out
files (replacejobid
with the id of your job).X
will be a number corresponding to your tasks range, so you should have 3 separate log files, one for each array task.k1234567@erc-hpc-login2:~$ cat slurm-56769_1.out Hello, I'm file 1 k1234567@erc-hpc-login2:~$ cat slurm-56769_2.out Hello, I'm file 2 k1234567@erc-hpc-login2:~$ cat slurm-56769_3.out Hello, I'm file 3
- Write an array job script where each task will print the contents of the files located in the
Singularity¶
-
Using containers in a batch job¶
Goal: Submit a batch job and execute singularity container command
- Submit the batch job using default scheduler options
- Use
lolcow
container - Execute
cowsay
command within the container that printsHello there
string
Hint
- Use
cowsay string
command to printstring
text - Ensure that you use the correct path to the container file
Sample answer
Create a sample script
test-singularity.sh
and add the following contents to it#!/bin/bash -l #SBATCH --job-name=test-singularity #SBATCH --partition=cpu #SBATCH --reservation=cpu_introduction singularity exec ~/lolcow_latest.sif cowsay "Hello there"
Submit the jobs using
k1234567@erc-hpc-login2:~$ sbatch test-singularity.sh Submitted batch job 56748
Analyse the output, which should be located in the
slurm-jobid.out
file (replacejobid
with the id of your job)k1234567@erc-hpc-login2:~$ cat slurm-56748.out _____________ < Hello there > ------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
-
Finding containers that are useful for your research¶
Goal: Use DockerHub to find a container that could be useful for your research and run it with Singularity
DockerHub is an online repository of container images. Many commonly used pieces of software have been containerized into images that are officially endorsed, providing a convenient way to use this software without needing to install it.
- On DockerHub, search for some software you use in your research and see if you can find a container for that software
- Can you find documentation about how to use the container?
- Download the image to the HPC cluster
-
Run it with Singularity
Hint
- If you can't think of a software to search for, try searching for
R
orpython
. - Start an interactive session before running the commands
- Use
singularity pull docker://{container name}
command to download the container (replacing the{}
and their contents with the correct container name) - Use
singularity run
command to run the container
Sample answer
Let's search for a container that provides the
tidyverse
R packages.There is a
tidyverse
container provided by therocker
organisation.To pull the container, first start an interactive session as singularity is not available on the login nodes:
srun --partition cpu --reservation cpu_introduction --pty /bin/bash -l
We can then pull the container with:
singularity pull tidyverse.sif docker://rocker/tidyverse
We can run R using the container with:
singularity exec tidyverse.sif R
- If you can't think of a software to search for, try searching for