Using the Cluster

Contents

    1. Cluster Login
    2. Partitions (Queues) and Compute Nodes
    3. How to use Environment Module
    4. How to use SLURM
      1. Click here for Slurm Examples
    5. Useful References and Cheatsheets

    Cluster Login

    The login node for the CompBio cluster is at cluster.csb.pitt.edu. To submit your jobs in a cluster, you should connect to the login node via ssh protocol. To do this, the Microsoft Windows users may use programs such as PuTTY or the free version of MobaXterm and the linux or macOS users should use a terminal (you should also note that you can use Windows 10 command prompt or Powershell after the April 2018 update for the same purpose).

  1. The head node uses Slurm as workload manager software. In order to connect to a head node type: ssh <USERNAME>@cluster.csb.pitt.edu where <USERNAME> is your cluster userid. For example, if a user with a userid of ‘abc123’ , they would use: ssh abc123@cluster.csb.pitt.edu.

    Partitions (Queues) and Compute Nodes

    It was mentioned above that we use slurm as the workload manager and this is how we get information about partitions and computes nodes:

    1. In slurm head node, you can use “sinfo” or “snodes” commands to see all the partitions and the available compute nodes.

    How to use Environment Module

    The module package provides a dynamic environment for a user. Practically, this tool creates/removes related environment (variable) settings dynamically. The following examples show how to use module:

      1. module avail
        shows the available modules
      2. module load anaconda/3
        loads anaconda version 3
      3. module unload anaconda/3
        unloads anaconda
      4. module list
        lists the loaded modules
      5. module purge
        unloads all the loaded modules

How to use SLURM

In order to use Slurm, you need to login to the head node first as explained above in the “cluster login” section.

  • To use Slurm workload manager, you need to use Slurm commands together with writing a submit shell using the Slurm syntax. These are few examples of Slurm commands:
    1. sbatch submit.sh
      submits submit.sh to the queue
    2. squeue -u user
      shows the user’s jobs status
    3. sjobs
      shows jobs status with more info
    4. scancel job_id
      deletes a job
    5. scontrol show job job_id
      shows detailed info about a job
    6. scontrol hold job_id
      holds a job
    7. scontrol release job_id
      releases a job (from being hold)
    8. salloc -p dept_24 --mem=24000MB --ntasks-per-node=10 srun --pty /bin/bash -i
      requests an interactive job on dept_24 partition with memory requirement of 24GB and 10 cores
    9. salloc -p dept_gpu --gres=gpu:1 --ntasks-per-node=4 srun --pty /bin/bash -i
      requests an interactive job on dept_gpu partition with one gpu card and 4 cores
  • Slurm Feature

    Slurm has an option called “Feature” which is used to assign one or more flags to a compute node. You can call a “Feature” in your submit
    shell using the “––constraint” option. For example, if one or a series of nodes have a feature called “24C”, you can use “––constraint=24C” in your script that the job to be run on one of those nodes. Note that you can use boolean expressions to call features, for instance if you want your job to be run by a node having either 8C or 24C features, you should use “––constraint=8C|24C” and a call to run your jobs on a node having both features use “––constraint=8C&24C”. To find out about the available feature(s) you should use “snodes” command (10th column).

    SSH to Node

    When you submit a job and Slurm assigns a node to run it, you are able to ssh to the node and monitor your jobs. For instance “ssh @n001” which does ssh to node n001.

    In this page, we provide two Slurm scripts in which the first one shows how to run a stress test on a dept_24 node using 24 cores for 120 seconds and the second one demonstrates how to run an array job of dimension four on dept_24 nodes using two cores for 120 seconds. Each line of code has a line of comment.

    Useful References & Cheatsheets