Using the Cluster

Contents:

  1. Cluster Login
  2. Common Commands & Cheatsheets
  3. Submitting a Job
    1. Hello World
    2. NAMD
    3. Python & Tensorflow
    4. AMBER
  4. Queue Information

 

Login to the Cluster

On versions of Windows older than Windows 10 version 1803 (April 2018 update), download PuTTY or MobaXterm to use ssh.

On Windows 10, macOS or linux: open a terminal or PowerShell window and type: ssh <USERNAME>@dvorak.csb.pitt.edu or  if you are using the gpu cluster type: ssh <USERNAME>@gpu.csb.pitt.edu.

Common Commands & Cheatsheets

TORQUE Commands

Submitting a job: qsub filename
Deleting a job: qdel <JOBID>
Start an interactive (ssh) job (1 gpu for 1 hour): qsub -I -q dept_gpu -l nodes=1:ppn=1:gpus=1,walltime=1:00:00
Show all of your jobs: qstat -u$USER
Show detailed information for all of your jobs: qstat -fu$USER
list all queues: qstat -q

Custom scripts

show available gpus: gpus.py
show available cpu cores: cpus.py

Useful References & Cheatsheets

Submitting Jobs

Note: Many sample scripts are located on the cluster in the directory /opt/sample_scripts

Hello World

Example Script:

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -N hello
#PBS -M email@example.com
#PBS -m abe
#PBS -q serial
echo Hello World
  • The first line is the shebang. This will determine the shell program. PBS will work with any shell for example you could use #!/bin/csh or #!/usr/bin/env python instead.
  • The next lines begin with #PBS, qsub will interpret these as arguments (they must be at the top of the script).
  • -l is used to request resources. nodes is the number of nodes requested, and ppn is the number of processors per node. walltime=hours:minutes:secs is the maximum amount of time the job will run before it is terminated. The full documentation for the resource list is here.
  • -N will set the name of the job (the default is the name of the script.)
  • -M sets the email to send information about the job’s status, and -m sets which emails to send. -m abe means email me when the job aborts, begins, or ends successfully.
  • -q determines which queue the job is submitted to.
  • After the #PBS lines you may write any number of bash commands.
  • For a complete list of arguments use man qsub or the online documentation.

To run this job save it as hello.pbs. Then type qsub hello.pbs. The job will be submitted, and you should see something like 3488.n000.dcb.private.net. The first number is the job id this could be used to delete this queued or running job with qdel 3488. Run qstat to see all the jobs or just qstat 3488 with the correct job id to see information about your job. It will look something like this:

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----    
3488.n000.dcb.private.net  hello           YOURNAME        00:00:00 C serial         

We can see that the S column shows that our status is C, that means the job is complete. It could also have an R for running, but our job finished in less than 1 second. You’ll notice that the job did not print anything to our terminal, but if you enter ls, you will see two new files: hello.o<jobid> and hello.e<jobid>. These files are the stdout and stderr of the job. cat hello.o* will print the the output of the job: “Hello World”. The other file should be empty as there were no errors.

Our job could also be just one line as:

echo echo Hello World | qsub -l nodes=1:ppn=1,walltime=00:01:00 -N hello -M email@example.com -m abe -q serial

The first part of this line is to redirect echo Hello World to stdin. qsub will use stdin if there is no filename in the last argument. The rest of the line sets all of the arguments that used to be in the script after #PBS.

This can be very useful as the command line arguments have the higher priority, for example we could also just use this to quickly change some arguments of a script we already wrote without editing the file for example, qsub -q dept_24_core -m n hello.pbs will run our hello world script with the same arguments except the queue will be changed, and no email will be sent, as -m p. This can be used to change any/all of the arguments of the script.

NAMD

The script will start again with similar boiler plate arguments.

 #!/bin/bash
 #PBS -q dept_gpu
 #PBS -l nodes=1:ppn=2:gpus=1
 #PBS -N namd_gpu
 #PBS -M email@example.com
 #PBS -m abe
 #PBS -j oe

The next step will be to create a directory for our output. All compute nodes will have access to a scratch drive at /scr for writing data quickly. The following lines will create a directory similar to /scr/USERNAME/myjob/12345.n198.dcb.private.net

export SCRATCH=/scr/$USER/$PBS_JOBNAME/$PBS_JOBID
mkdir -p "$SCRATCH"

Note: Export will make SCRATCH an environment variable that we can use in other scripts.

The next step is to actually run NAMD:

/opt/NAMD/NAMD_2.12_multicore_CUDA_8.0/namd2 +idlepoll +p$PBS_NUM_PPN namd.conf >   $SCRATCH/namd.log

This will also redirect NAMD’s stdout to a file called namd.log in the scratch directory we created.

After the job is done, we copy the files back to the directory that we ran qsub from.

cd $PBS_O_WORKDIR 
mkdir -p ./$PBS_JOBID
cp $SCRATCH/* ./$PBS_JOBID

 

The configuration file should also be set to write to the scratch directory. NAMD uses a version of tcl as the format for their configuration file. Change the outputname line in the configuration file to the following to write to our directory.

outputname           $env(SCRATCH)/outputName

An example script and NAMD simulation files are available to look at on the cluster in the directory, /opt/sample_scripts/NAMD

Python and Tensorflow

Start by SSHing in to gpu.csb.pitt.edu To run a python script that uses tensorflow, the best practice is to create a new conda enviroment by running this command on the cluster: bash /opt/sample_scripts/tensorflow_gpu/createATfCondaEnv.sh or by manually following the steps below:

conda create -n tf python=2.7 anaconda

This will take some time. When it is finished, source the environment named tf:

source activate tf

Now we can install tensorflow:

pip install --upgrade tensorflow-gpu

We can set the correct environment variables in anaconda by creating a script that runs when we activate the environment:

mkdir -p ~/.conda/envs/tf/etc/conda/deactivate.d
mkdir -p ~/.conda/envs/tf/etc/conda/activate.d
touch ~/.conda/envs/tf/etc/conda/activate.d/env_vars.sh
touch ~/.conda/envs/tf/etc/conda/deactivate.d/env_vars.sh

The library path for CUDA needs to be set for tensorflow to work on gpus. Write this to ~/.conda/envs/tf/etc/conda/activate.d/env_vars.sh

#!/bin/sh
export LD_LIBRARY_PATH=/net/antonin/usr/local/cuda-9.0/lib64:/net/antonin/usr/local/cuda-8.0/lib64/:$LD_LIBRARY_PATH
export CUDA_HOME=/net/antonin/usr/local/cuda-9.0

 

Here is an example job that will use our enviroment :

#!/bin/bash
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=2
#PBS -N tf_test

source /usr/local/bin/activate tf
python_file_url='https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/1_Introduction/helloworld.py'
#note: running code downloaded from the internet without checking it first is bad practice
curl "$python_file_url" | python

The stderr should have information about the gpus in the file tf_test.e<your job number> :

2018-03-12 14:17:44.131513: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-12 14:17:44.510051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:03:00.0
totalMemory: 11.92GiB freeMemory: 11.80GiB
2018-03-12 14:17:44.824521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 1 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:81:00.0
totalMemory: 11.92GiB freeMemory: 11.81GiB
2018-03-12 14:17:44.824587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] Device peer to peer matrix
2018-03-12 14:17:44.824616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] DMA: 0 1
2018-03-12 14:17:44.824623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 0:   Y N
2018-03-12 14:17:44.824628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 1:   N Y
2018-03-12 14:17:44.824635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1
2018-03-12 14:17:49.724048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11431 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0, compute capability: 5.2)
2018-03-12 14:17:49.838059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11432 MB memory) -> physical GPU (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:81:00.0, compute capability: 5.2)

The file from stdout, test_tf.o should just have one line:

Hello, TensorFlow!

AMBER

The most important step to running AMBER is to set the AMBERHOME enviroment variables. Amber provides a script that we can use:

source /opt/amber16/amber.sh

Now you will be able to run a simulation using Amber’s pmemd (after changing to a directory with simulation files):

pmemd -O -i mdin -o mdout -p prmtop -c inpcrd

Here is an example script you can run located on the cluster at /opt/sample_scripts/amber/sample_amber_GPU.pbs:

#!/bin/bash
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=10:00
#PBS -N sample_amber_GPU
#PBS -j oe

echo Starting $PBS_JOBNAME on $HOSTNAME

#export is used here so that we can use the same directory
#in the NAMD config file
export SCRATCH=/scr/$USER/$PBS_JOBNAME/$PBS_JOBID
#this function will execute when the script exits
#the files are copied to the directory where qsub was ran from
copyfiles() {
    echo "copyng files"
    cp $SCRATCH/* $PBS_O_WORKDIR/$PBS_JOBID
}
trap copyfiles EXIT
#print the directory and cd. $PBS_O_WORKDIR is the directory qsub was ran from
echo cd $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo making directory $SCRATCH
#creates directory on scratch drive
mkdir -p $SCRATCH
echo making directory $(pwd -P)/$PBS_JOBID
#creates directory under the working directory that qsub was ran from
mkdir -p ./$PBS_JOBID

cd $SCRATCH
cp $PBS_O_WORKDIR/mdin.GPU $PBS_O_WORKDIR/prmtop $PBS_O_WORKDIR/inpcrd .
source /opt/amber16/amber.sh
pmemd.cuda -O -i mdin.GPU -o mdout -p prmtop -c inpcrd

 

Queue/Node information

Note: the most up to date information is always available by using the commands such as gpus.py, cpus.py, qstat -q, or pbsnodes.

gpu.csb.pitt.edu

To request a specific gpu use the following format in your job: #PBS -l nodes=1:ppn=1:gpus=1:gtx1080Ti. Replace gtx1080Ti with any gpu with the same spelling and capitalization as seen below on the buttons and in the properties column.

dvorak.csb.pitt.edu

To only run on intel CPUs use the following in your job: #PBS -l nodes=1:ppn=1:intel