Centaurus / GPU User Notes

The Centaurus Slurm partition is an HPC resource that is dedicated to supporting student access associated with class assignments. Centaurus is available to students in a designated set of courses.

ACCESS TO the Educational Cluster

Before logging into the educational cluster, please ensure that you have setup Duo (Setup Duo). In additon, if you are on campus, please connect to EDUROAM wireless. Otherwise, please connect to the campus VPN (Setup VPN).

Centaurus and GPU can be accessed via SSH to “hpc-student.uncc.edu.” Your credentials will be your NinerNET username and NinerNET password. Please do not include “@uncc.edu” on your NinerNET username. Once logged in, you will be on the Educational Cluster Interactive/Submit host which should be used to perform tasks such as transferring data using SCP or SFTP, and for code development.

From this node, a user can submit jobs requesting the following resources:

General Compute Nodes (10 nodes with 36 cores/node = 360 procs total)
GPU Compute Nodes:
- 2 nodes each with 16 cores and 4 Titan RTX GPUs/node
- 1 node with 16 cores and 8 Titan V GPUs
- 1 node with 8 cores and 8 GTX-1080ti GPUs

Jobs should always be submitted to the “Centaurus” partition for CPU jobs and “GPU” for GPU jobs, unless directed otherwise by your instructor.

NFS STORAGE

Each student is given a default storage quota of 150 GBs for their home directory located at /users/. This volume is BACKED Up nightly. Users can check their current quota usage using the command “urcquota”.

Each class also has a shared folder located at /projects/class/ which instructors may use to share information or data with class members.

ACCESSING SOFTWARE

Centaurus uses environment modules to set up the user environment to use specific software packages. Additional details on modules can be found here.

SUBMITTING COMPUTE JOBS

Centaurus uses the Slurm scheduler to manage access to the computational resources. To submit a job to the scheduler, users must prepare a “submit script”. At its simplest, a submit script (my_script.sh) would look like this:

#! /bin/bash
/users//myprogram

And would be submitted to the cluster as follows:

sbatch --job-name=myjob --partition=Centaurus --time=00:01:00 my_script.sh

Or, instead of specifying the Slurm directives on the command line, you can put them in the script like this:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --partition=Centaurus
#SBATCH --time=00:01:00
$HOME/myprogram

Which would simplify your sbatch command to this:

sbatch my_script.sh

Submit scripts may also load any needed environment modules and set additional parameters specifying details of the desired execution environment (e.g. number of required processes, memory size, gpu access, etc.)

PARALLEL PROCESSING WITH OPENMPI

Slurm supports parallel processing via message passing. To access OpenMPI, load the desired modules: e.g.

$ module load openmpi
$ mpicc myprogram.c

And include a request for multiple processes in the submit script:

#! /bin/bash #SBATCH --job-name="MyMPIJob"
#SBATCH --partition=Centaurus
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:01:00 module load openmpi/5.0.2
mpirun $HOME/myprogram

and submit with sbatch:

$ sbatch my_script.sh

The Slurm options may also be set on the sbatch command line as follows

$ sbatch --job-name=MyMPIJob --partition=Centaurus --nodes=4 --ntasks-per-node=4 my_script.sh

In this example, the resource request is for 4 cores (or processes) on each of 4 compute nodes for a total of 16 processes.

SUBMITTING GPU JOBS

The educational cluster has a GPU parition that can be utilized for GPU computing jobs. Here is a simple example of a submit script that will queue up and run a GPU compute job:

#! /bin/bash #SBATCH --job-name=MyGPUJob
#SBATCH --partition=GPU
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --time=00:01:00
nvidia-smi

The above job will request one core and one gpu on a single GPU compute node.

Request a particular type of GPU

You can specify the GPU type by modifying the “gres” directive, like so:

#SBATCH --gres=gpu:TitanV:4   # (will reserve 4 Titan V GPUs)
#SBATCH --gres=gpu:TitanRTX:2 # (will reserve 2 Titan RTX GPUs)
#SBATCH --gres=gpu:V100:1     # (will reserve 1 Tesla V100s GPU)

Request a single- or double-precision GPU

You can request a single-precision (FP32) or a double-precision (FP64) GPU by specifying a constraint for your job. For example:

#SBATCH --gres=gpu:1
#SBATCH --constraint=FP32 # (will reserve 1 single-precision GPU)

-or-

#SBATCH --gres=gpu:1
#SBATCH --constraint=FP64 # (will reserve 1 double-precision GPU)

Request a GPU with a specific compute capability

Just in case your code is optimized for a particular compute capability, you can request that by specifying a constraint for your job. For example:

#SBATCH --gres=gpu:1
#SBATCH --constraint=compute_75 # (will reserve 1 TitanRTX GPU)

-or-

#SBATCH --gres=gpu:1
#SBATCH --constraint=compute_61 # (will reserve 1 GTX1080ti GPU)

In order to find out what type/count of the GPUs (in the GRES column), and the FP32/FP64 precision (in the AVAIL_FEATURES column), you can use the following “sinfo” command:

$ sinfo -p GPU -o "%12N %6c %8m %28f %20G"
NODELIST     CPUS   MEMORY   AVAIL_FEATURES               GRES
gal-gpu[1-2] 16     191560   gpu,FP32,compute_75,stdmem   gpu:TitanRTX:4
gal-gpu3     16     191560   gpu,FP64,compute_70,stdmem   gpu:TitanV:8
gal-gpu[4-6] 8      191563   gpu,FP32,compute_61,stdmem   gpu:GTX1080ti:8

When we add new GPU nodes to the partition, they may have a new model of GPU in them, so the above list may change as we add new (and retire old) GPU compute nodes to the cluster.