Using the Cluster

Using the Cluster

Contents

  1. Cluster Login
  2. Common Commands & Cheat Sheets
    1. TORQUE Commands
    2. Custom Scripts
  3. Submitting Jobs
    1. Hello World
    2. NAMD
    3. AMBER
    4. Python & Tensorflow
  4. Queue Information

Login to the Cluster

For all Windows versions, the best practice is to use the free version of MobaXterm.

On macOS or linux: open a terminal and type ssh <USERNAME>@dvorak.csb.pitt.edu for the CPU cluster or ssh <USERNAME>@gpu.csb.pitt.edu for the GPU cluster. (You can also do this with the Windows 10 command prompt or Powershell after the April 2018 update.)

Common Commands & Cheat Sheets

TORQUE Commands

  • Submitting a job: qsub filename
  • Deleting a job: qdel <JOBID>
  • Start an interactive (ssh) job (1 gpu for 1 hour): qsub -I -q dept_gpu -l nodes=1:ppn=1:gpus=1,walltime=1:00:00
  • Show all of your jobs: qstat -u$USER or alternatively and only on dvorak: showq -u$USER
  • Show detailed information for all of your jobs: qstat -fu$USER
  • list all queues: qstat -q

Custom Scripts

  • show available gpus: gpus.py
  • show available cpu cores: cpus.py

Some Basic Commands

  • Show a list of files in the working directory: ls
  • Change directory: cd path/to/directory
  • See the contents of a file: view filename, less filename, or cat filename
  • Edit a file: vim filename or nano filename
  • See available software modules: module avail
  • Load software module: module load name_of_module
  • Remove (unload) software module: module remove name_of_module
  • Switch software modules: module swap software1 software2
  • Check out the references below for more!

Useful References & Cheatsheets

Submitting Jobs

Note: Look at and run sample scripts on the cluster in this directory: /opt/sample_scripts.

Hello World

#!/usr/bin/env bash
# Set job name to "hello"
#PBS -N hello

# Submit to dept_24_core queue (use "qstat -q" for a list)
#PBS -q dept_24_core

# Request 1 node and 1 CPU core per each node for up to 1 minute
#PBS -l nodes=1:ppn=1,walltime=00:01:00

# Join standard output(stdout) and standard error(stderr) into the output file
#PBS -j oe

# email yourname@pitt.edu if the job begins, aborts, or ends
#PBS -m abe
#PBS -M yourname@pitt.edu

echo Starting $PBS_JOBNAME on $HOSTNAME

# export is used here so that we can use the same directory in other scripts
# NOTE: access /scr/ from anywhere using /net/<node name>/path/to/your/output
# e.g. "cd /net/n123/scr/12345.n000.dcb.private.net"

export SCRATCH="/scr/$PBS_JOBID"
# export SCRATCH="/scr/$USER/$PBS_JOBNAME/$PBS_JOBID" #alternative output dir

copy_before_job_source="/path/to/files/"
# copy_before_job_source="$PBS_O_WORKDIR" #copy all files from current directory

# dump output into "<current directory>/hello/12345.n000.dcb.private.net" at end
copy_after_job_target="$PBS_O_WORKDIR/$PBS_JOBNAME/$PBS_JOBID"
# or just dump scratch into the working directory where the job was submitted
# copy_after_job_target="$PBS_O_WORKDIR"

# creates directory in the same folder where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"

# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"

# copy files to the scratch drive
rsync -avz "$copy_before_job_source"/* "$SCRATCH"

#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT

cd "$SCRATCH"

#load the module that does nothing
module load null

cat hello.txt > output.log 2> error.log

Save this in a file named hello.sh. Use the command qsub hello.sh to submit the job to the queue.

Some important things to notice:

  • bash ignores everything after the # in each line
  • all of the lines have #PBS at the very start of a line and before the first bash command will be read by qsub and interpreted as command arguments Use man qsub or read the online documentation for more information about additional arguments

Check the Status of the Running Job

You shold’ve seen your job id after you entered the qsub command. Run qstat jobid (e.g. qstat 3488) on the server where you submitted your job (dvorak / n000 or gpu), and you’ll see something like this:

Job ID                            Name         User         Time Use  S Queue
------------------------------- ---------- --------------   --------- - ----- 
3488.n000.dcb.private.net        hello      your username  00:00:00   C serial

If you do not know the id use qstat -u$USER.

We can see that our hello job with job id 3488 had the was completed in under one second. The S column shows the job status C for completed. The other ones are Q for waiting in the queue, or R for running.

If the job was running for too long or you made a mistake, this job could be killed with qdel 3488.

Handling Program Input / Output

When we run our program, in this case, cat , it will send the files it is given to stdout. The > operator redirects stdout to a file, and the 2> operator will redirect stderr to a file, stderr is used for diagnostic and error messages, so depending on the size of these messages it is usually OK to omit this part. PBS TORQUE will send all of our script’s stdout and stderr to a file that is copied back, by default to the directory where qsub was ran, if that file becomes too large, it can fail to copy back. So, redirecting to the scratch drive is the most reliable way to get your output.

In this script these three variables are created to copy files around, here is how they are used in this example:

  • $copy_before_job_source – Directory with files to be copied to the scratch working directory for the job. Set this equal to $PBS_O_WORKDIR if your files are in the current directory.
  • $copy_after_job_target – Permanent location for files after the job finishes.
  • $SCRATCH – Working directory on the compute node’s hard drive.

NAMD

#!/usr/bin/env bash
#PBS -N my_namd_gpu_job
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:10:00 # 10 min
#PBS -j oe
#PBS -m abe
#PBS -M example@example.com

echo Starting $PBS_JOBNAME on $HOSTNAME
export SCRATCH="/scr/$PBS_JOBID"

# copy all files from current directory
copy_before_job_source="$PBS_O_WORKDIR"

# just dump scratch into the working directory where the job was submitted
copy_after_job_target="$PBS_O_WORKDIR"

# creates directory under where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"

# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"
# move files to SCRATCH
rsync -avz "$copy_before_job_source"/* "$SCRATCH"

#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT

cd "$SCRATCH"

module load namd/2.12/cuda-8.0
config_file=apoa1.namd
namd2 +idlepoll +p$PBS_NUM_PPN $config_file > $SCRATCH/namd.log

In this job we are using an additional PBS variable, $PBS_NUM_PPN, this is the number of processors that were requested for the job.
Other than that most most of this script is unchanged from the hello job.

Amber

#!/usr/bin/env bash
#PBS -N my_amber_gpu_job
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:10:00
#PBS -j oe
#PBS -m abe
#PBS -M example@example.com

echo Starting $PBS_JOBNAME on $HOSTNAME

export SCRATCH="/scr/$PBS_JOBID"

# copy all files from current directory
copy_before_job_source="$PBS_O_WORKDIR"

# just dump scratch into the
# working directory where the job was submitted
copy_after_job_target="$PBS_O_WORKDIR"

# creates directory under where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"

# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"

rsync -avz "$copy_before_job_source"/* "$SCRATCH"

#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT

cd "$SCRATCH"

module load amber/18

# -O is used to overwrite existing output files
pmemd.cuda -O

if you want to use gpus, make sure you use the command pmemd.cuda instead of pmemd. Also, notice how we do not supply the names of the input or output files to pmemd.cuda. You can get away with this if you want to use the default file names. If you want to use more descriptive file names, use pmemd --help to get a list of which flags are used for each file.

Python & Tensorflow

Getting Started

For using python with Tensorflow, we have provided a few basic anaconda environments. To activate one of our python 3 environments, use module load anaconda/3 to add anaconda to your path. Then use conda info --envs to see what is available. In this example we will use source activate tf-py3-gpu.

#!/usr/bin/env bash
#PBS -N hello
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:05:00

module load anaconda/3
# The tensorflow gpu anaconda environment will automatically
# load the CUDA/9.0 module
source activate tf-py3-gpu
# Use the tensorflow library to say 'Hello'
python -c \
'import tensorflow as tf
print(tf.Session().run(tf.constant("Hello")))'

Note: python -c will use an argument as python code instead of a python file, and we can use the ‘\‘ to have bash ignore the new line, so the command is interpreted as if it is one line. The second new line is OK, because it is inside of quotes, and part of the python code.

Installing Packages

If you need to install your own packages, for example a specific version of numpy, you should create or clone a conda environment:

module load anaconda/3
conda create -n my_env python=3.6 anaconda
# You could replace the above line with the following
# if you wanted a copy of an environment that you already use
# conda create -n my_env_name --clone tf-py3-gpu
source activate my_env_name
conda install numpy=1.13

You can list any number of packages with or without version numbers after conda install. If conda install can not find the package you need try to search for it on anaconda cloud. Usually the package you need exists, but you need to specify a channel, for example rdkit can be installed with conda install -c rdkit rdkit

Use pip install --user numpy==1.13 to accomplish the same install of numpy, but this will install for all of your environments (usually the anaconda environment will override your packages), if you are in an anaconda environment you can omit the --user argument to install into the environment. The problem with only using pip install --user is there’s not a simple way to run python code with conflicting dependencies.

Read more here.

Queue Information

The queue information is here.

Note: the most up to date information is always available by using the commands such as gpus.py, cpus.py, qstat -q, or pbsnodes.

gpu.csb.pitt.edu

To request a specific gpu use the following format in your job: #PBS -l nodes=1:ppn=1:gpus=1:gtx1080Ti. Replace gtx1080Ti with any gpu with the same spelling and capitalization as seen below on the buttons and in the properties column.

dvorak.csb.pitt.edu

To only run on intel CPUs use the following in your job: #PBS -l nodes=1:ppn=1:intel