- Cluster Login
- Partitions (Queues) and Compute Nodes
- How to use Environment Module
- How to use SLURM
- Click here for Slurm Examples
- Useful References and Cheatsheets
The login node for the CompBio cluster is at cluster.csb.pitt.edu. To submit your jobs in a cluster, you should connect to the login node via ssh protocol. To do this, the Microsoft Windows users may use programs such as PuTTY or the free version of MobaXterm and the linux or macOS users should use a terminal (you should also note that you can use Windows 10 command prompt or Powershell after the April 2018 update for the same purpose).
- The head node uses Slurm as workload manager software. In order to connect to a head node type:
<USERNAME>is your cluster userid. For example, if a user with a userid of ‘abc123’ , they would use:
Partitions (Queues) and Compute Nodes
It was mentioned above that we use slurm as the workload manager and this is how we get information about partitions and computes nodes:
- In slurm head node, you can use “sinfo” or “snodes” commands to see all the partitions and the available compute nodes.
How to use Environment Module
The module package provides a dynamic environment for a user. Practically, this tool creates/removes related environment (variable) settings dynamically. The following examples show how to use module:
shows the available modules
module load anaconda/3
loads anaconda version 3
module unload anaconda/3
lists the loaded modules
unloads all the loaded modules
How to use SLURM
In order to use Slurm, you need to login to the head node first as explained above in the “cluster login” section.
- To use Slurm workload manager, you need to use Slurm commands together with writing a submit shell using the Slurm syntax. These are few examples of Slurm commands:
submits submit.sh to the queue
squeue -u user
shows the user’s jobs status
shows jobs status with more info
deletes a job
scontrol show job job_id
shows detailed info about a job
scontrol hold job_id
holds a job
scontrol release job_id
releases a job (from being hold)
salloc -p dept_24 --mem=24000MB --ntasks-per-node=10 srun --pty /bin/bash -i
requests an interactive job on dept_24 partition with memory requirement of 24GB and 10 cores
salloc -p dept_gpu --gres=gpu:1 --ntasks-per-node=4 srun --pty /bin/bash -i
requests an interactive job on dept_gpu partition with one gpu card and 4 cores
Slurm has an option called “Feature” which is used to assign one or more flags to a compute node. You can call a “Feature” in your submit
shell using the “––constraint” option. For example, if one or a series of nodes have a feature called “24C”, you can use “––constraint=24C” in your script that the job to be run on one of those nodes. Note that you can use boolean expressions to call features, for instance if you want your job to be run by a node having either 8C or 24C features, you should use “––constraint=8C|24C” and a call to run your jobs on a node having both features use “––constraint=8C&24C”. To find out about the available feature(s) you should use “snodes” command (10th column).
SSH to Node
When you submit a job and Slurm assigns a node to run it, you are able to ssh to the node and monitor your jobs. For instance “ssh @n001” which does ssh to node n001.
In this page, we provide two Slurm scripts in which the first one shows how to run a stress test on a dept_24 node using 24 cores for 120 seconds and the second one demonstrates how to run an array job of dimension four on dept_24 nodes using two cores for 120 seconds. Each line of code has a line of comment.
Useful References & Cheatsheets