This fall semester of 2021, the educational cluster batch scheduler will be the Slurm Workload Manager.
This guide provides information on how you can migrate your scripts and jobs from PBS to Slurm, if need be. There are two main aspects involved in the migration, learning the new commands for job submission and job script conversion. The concepts are the same in both schedulers, but the syntax of the commands, directives, and environment variables differ.
Equivalent Slurm commands exist for those commonly used in PBS, with the command names and options detailed in the following table.
Command Comparison
Command | PBS (Torque/Moab) | Slurm |
---|---|---|
Submit a Job | qsub [job-submit-script] | sbatch [job-submit-script] |
Delete a Job | qdel [job-id] | scancel [job-id] |
Queue List | qstat | squeue |
Queue Info | qstat -q [queue] | scontrol show partition [partition] |
Node List | pbsnodes -a [:queue] | scontrol show nodes |
Node details | pbsnodes [node] | scontrol show node [node] |
Job status (by job) | qstat [job-id] | squeue -j [job-id] |
Job status (by user) | qstat -u [user] | squeue -u [user] |
Job status (detailed) | qstat -f [job-id] | scontrol show job -d [job-id] |
Show expected start time | showstart [job-id] | squeue -j [job-id] --start |
For a comprehensive list of Slurm commands, please download this Command Reference PDF on SchedMD's Website.
Existing PBS batch scripts can be readily migrated for use on the Slurm resource manager, with some minor changes to the directives and referenced environment variables. The more popular Slurm equivalent directives and environment variables are outlined below.
Directive Comparison
Directive | PBS (Torque/Moab) | Slurm |
---|---|---|
Script directive | #PBS | #SBATCH |
Job name | -N [name] | --job-name=[name] |
Queue / Partition | -q [queue] | --partition=[queue] |
Wall time limit | -l walltime=[hh:mm:ss] | --time=[hh:mm:ss] |
Node count | -l nodes=[count] | --nodes=[count] |
CPU count per node | -l ppn=[count] | --ntasks-per-node=[count] |
Memory size | -l mem=[limit] (*per job) | --mem=[limit] (*per node) |
Memory per CPU | -l pmem=[limit] | --mem-per-cpu=[limit] |
Standard output file | -o [filename] | --output=[filename] |
Standard error file | -e [filename] | --error=[filename] |
Combine stdout/stderr | -j oe (to stdout) | (default) |
Copy environment | -V | --export=ALL (default) |
Copy env variable | -v [var] | --export=var |
Job dependency | -W depend=[state:jobid] | --dependency=[state:jobid] |
Event notification | -m abe | --mail-type=[events] |
Email address | -M [address] | --mail-user=[address] |
For a full list of directives, please consult SchedMD's sbatch Webpage.
Environment Variable Comparison
Description | PBS (Torque/Moab) | Slurm |
---|---|---|
Job Name | $PBS_JOBNAME | $Slurm_JOB_NAME |
Job ID | $PBS_JOBID | $Slurm_JOB_ID |
Submit Directory | $PBS_O_WORKDIR | $Slurm_SUBMIT_DIR |
Submit Host | $PBS_O_HOST | $Slurm_SUBMIT_HOST |
Node List | cat $PBS_NODEFILE | $Slurm_JOB_NODELIST |
Job Array Index | $PBS_ARRAYID | $Slurm_ARRAY_TASK_ID |
Queue Name | $PBS_QUEUE | $Slurm_JOB_PARTITION |
Number of Nodes | $PBS_NUM_NODES | $Slurm_NNODES |
Number of Procs | $PBS_NP | $Slurm_NTASKS |
Procs per Node | $PBS_NUM_PPN | $Slurm_CPUS_ON_NODE |
For a full list of environmnet variables, please consult the Environment Variables Section on SchedMD's sbatch Webpage.
Tips on Converting Submit Scripts
We have created a utility called “p2s” (for PBS-to-Slurm), which is available on our Interactive/Submit hosts in your standard $PATH. Simply pass it the name of the script you would like to convert (give full-path if you are not in same directory of the script), and it will convert it from a PBS to a Slurm submit script. For example, to convert a PBS submit script named "submit-script.pbs" issue the following command on the Interactive/Submit host:
p2s submit-script.pbs
This will output the changes directly to STDOUT, so that you can view them right in your SSH session. If everything looks good and you would like to save the converted script to a new file, simply redirect the output to a file name of your choice (we recommend using a NEW file name, not the existing file name):
p2s submit-script.pbs > submit-script.slurm
There are many other conversion scripts available online to download and use, and you are welcome to download and try them out in our environment. or you can manually convert your scripts given the directives and environmnet variables listed above. Once converted, you may have to make some small tweaks or edits in order to get it fully ready for submitting to Slurm.
We also have a collection of example Slurm submit scripts that you can copy and use as a template: /apps/slurm/examples
Environment Modules: Check your "module load" lines
While you're converting your scripts from PBS to Slurm, you may also need to check your "module load" commands. Many of the scientific applications have been upgraded from prevoius semesters, so the versions available of any given app may no longer be available this semester. Please verify your application(s) *and* version(s) exist in on the cluster prior to job submission. Tensorflow is a great example. If you have a tensorflow v1.14 submit script from last semester, it will not work this semester because we have retired that version and only have v2.2.0 and v2.4.1:
# Spring 2021 $ module avail tensorflow ------------------------ /apps/usr/slurm/modules/apps ------------------------------ tensorflow/1.14-anaconda3-cuda10.0 tensorflow/2.2-anaconda3-cuda10.2 tensorflow/2.0-anaconda3-cuda10.0(default)
# Fall 2021 $ module avail tensorflow -------------------------- /apps/usr/modules/apps ---------------------------------- tensorflow/2.2.0-cuda10.2 tensorflow/2.4.1-cuda11.2(default)
So if you submit a tensorflow job that has the command "module load tensorflow/1.14-anaconda3-cuda10.0" you will get the following error in your output script:
ERROR: Unable to locate a modulefile for 'tensorflow/1.14-anaconda3-cuda10.0'
In order to submit the job and have it successfully processed, you must update the "module load" line to use one of the current tensorflow versions, for example:
module load tensorflow/2.4.1-cuda11.2
And the best way to verify that modules will load prior to job submission is, for each "module load ..." command in your submit script, execute it interactively, at the command line on the submit host. If you get an error running the "module load" line interactively, you will get an error using that same "module load" line in a compute job.
For more information on Environment Modules, please check out our FAQ.
For more information about the Slurm Workload Manager, please check out the Slurm Documentation on SchedMD's Website.