<< Back to posts

How to Run Sbatch in Slurm

Posted on March 21, 2023 • Tags: slurm sbatch compute remote server

Note: This is in progress

Quickstart

The script below requests 1 CPU-core and 4 GB of memory for 1 minute of run time.

#!/bin/bash
#SBATCH --job-name=myjob         # name of your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # Total number of tasks across all nodes
#SBATCH --cpus-per-task=8        # Use 8 CPU cores on the machine
#SBATCH --mem-per-cpu=2G         # memory per cpu-core (4G is default)
#SBATCH --gres=gpu:1   					 # request a GPU
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu
#SBATCH --output=slurm_%A_%a.out    # print output to slurm_<job_id>_<array_id>.out
#SBATCH --array=1-5              # this creates an array

# Load Python
module load python

# Echo environment vars set by slurm
echo $SLURM_ARRAY_JOB_ID # unique job ID (integer)
echo $SLURM_ARRAY_TASK_ID # unique array ID within job (integer) - only exists if --array is specified

# Print
echo "hi"

Launch the job by running:

sbatch my_file.sbatch

Gotchas

In “99.9% of cases”, the correct way to request N CPU cores is:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=N

If you don’t use srun in your sbatch, then always set --ntasks=1

You can also likely ignore all references to MPI – this Message Passing Interface is only needed for coordinating jobs that use multiple nodes. Usually, you’ll parallelize your scripts on a single node

Sbatch Guide

What does each #SBATCH directive mean?

#SBATCH Alias Format Example Definition
–job-name -J String my_job Name of job. Will appear in output of commands like squeue
–nodes -N Integer 1 Node count. Default to allocate neough nodes to fulfill other resource demands
–ntasks -n Integer 1 Total number of tasks across all nodes. Default is one CPU per task. Only required for MPI workfloads, and in order to take advantage of this you must use the srun command to launch each individual task
–cpus-per-task -c Integer 1 CPU cores per task. This must be <= max # of cores available on a single compute node
–gpus-per-node   Integer 1 Number of GPUs per node
–gres   String gpu:1 Number of GPUs per node
–mem   Integer 2G Memory per node. Defaults to MB.
–mem-per-cpu   Integer 2G Memory per CPU core
–mem-per-gpu   Integer 2G Memory per GPU
  -p String nigam-a100 Name of partition to run job on
–time -t DD-HH:MM:SS 01-00:00:00 (1 day) Total run time limit
–mail-type=begin   String – One of: begin/end/fail/requeue/all   Send email when job begins or ends
–mail-user   Email email@email.com Email to send job alerts
–input -i String stdin_%A.in Where stdin is read from
–output   String stdout_%A.out Where stdout is written
–error -e String stderr_%A.out Where stderr is written

Useful Slurm Commands

List status of current jobs

Command

squeue \
	-u mwornow # only show info for user `mwornow`
	-j <JOB_ID> # only show info for job with ID
	-l # show additional job info

Output

XXXX

View currently running job status

Command

scontrol show job <JOB_ID>

Output

XXXX

Cancel job

Command

scancel <JOB_ID>

Output

XXXX

Launch interactive job

salloc
srun --pty /bin/bash

References

  • https://researchcomputing.princeton.edu/support/knowledge-base/slurm
  • https://sites.google.com/nyu.edu/nyu-hpc/training-support/tutorials/slurm-tutorial
  • https://hpc.llnl.gov/banks-jobs/running-jobs/slurm
  • https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html
  • https://login.scg.stanford.edu/faqs/cores/ – great, easy-to-read website
  • https://slurm.schedmd.com/tutorials.html – extremely detailed API