Note: Look at and run sample scripts on the cluster in this directory: /opt/sample_scripts.
Hello World
#!/usr/bin/env bash
# Set job name to "hello"
#PBS -N hello
# Submit to dept_24_core queue (use "qstat -q" for a list)
#PBS -q dept_24_core
# Request 1 node and 1 CPU core per each node for up to 1 minute
#PBS -l nodes=1:ppn=1,walltime=00:01:00
# Join standard output(stdout) and standard error(stderr) into the output file
#PBS -j oe
# email yourname@pitt.edu if the job begins, aborts, or ends
#PBS -m abe
#PBS -M yourname@pitt.edu
echo Starting $PBS_JOBNAME on $HOSTNAME
# export is used here so that we can use the same directory in other scripts
# NOTE: access /scr/ from anywhere using /net/<node name>/path/to/your/output
# e.g. "cd /net/n123/scr/12345.n000.dcb.private.net"
export SCRATCH="/scr/$PBS_JOBID"
# export SCRATCH="/scr/$USER/$PBS_JOBNAME/$PBS_JOBID" #alternative output dir
copy_before_job_source="/path/to/files/"
# copy_before_job_source="$PBS_O_WORKDIR" #copy all files from current directory
# dump output into "<current directory>/hello/12345.n000.dcb.private.net" at end
copy_after_job_target="$PBS_O_WORKDIR/$PBS_JOBNAME/$PBS_JOBID"
# or just dump scratch into the working directory where the job was submitted
# copy_after_job_target="$PBS_O_WORKDIR"
# creates directory in the same folder where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"
# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"
# copy files to the scratch drive
rsync -avz "$copy_before_job_source"/* "$SCRATCH"
#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT
cd "$SCRATCH"
#load the module that does nothing
module load null
cat hello.txt > output.log 2> error.log
Save this in a file named hello.sh. Use the command qsub hello.sh
to submit the job to the queue.
Some important things to notice:
- bash ignores everything after the
#
in each line - all of the lines have
#PBS
at the very start of a line and before the first bash command will be read byqsub
and interpreted as command arguments Useman qsub
or read the online documentation for more information about additional arguments
Check the Status of the Running Job
You shold’ve seen your job id after you entered the qsub
command. Run qstat jobid
(e.g. qstat 3488
) on the server where you submitted your job (dvorak / n000 or gpu), and you’ll see something like this:
Job ID Name User Time Use S Queue
------------------------------- ---------- -------------- --------- - -----
3488.n000.dcb.private.net hello your username 00:00:00 C serial
If you do not know the id use qstat -u$USER
.
We can see that our hello job with job id 3488 had the was completed in under one second. The S
column shows the job status C
for completed. The other ones are Q
for waiting in the queue, or R
for running.
If the job was running for too long or you made a mistake, this job could be killed with qdel 3488
.
Handling Program Input / Output
When we run our program, in this case, cat
, it will send the files it is given to stdout
. The >
operator redirects stdout
to a file, and the 2>
operator will redirect stderr
to a file, stderr is used for diagnostic and error messages, so depending on the size of these messages it is usually OK to omit this part. PBS TORQUE will send all of our script’s stdout
and stderr
to a file that is copied back, by default to the directory where qsub
was ran, if that file becomes too large, it can fail to copy back. So, redirecting to the scratch drive is the most reliable way to get your output.
In this script these three variables are created to copy files around, here is how they are used in this example:
$copy_before_job_source
– Directory with files to be copied to the scratch working directory for the job. Set this equal to$PBS_O_WORKDIR
if your files are in the current directory.$copy_after_job_target
– Permanent location for files after the job finishes.$SCRATCH
– Working directory on the compute node’s hard drive.
NAMD
#!/usr/bin/env bash
#PBS -N my_namd_gpu_job
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:10:00 # 10 min
#PBS -j oe
#PBS -m abe
#PBS -M example@example.com
echo Starting $PBS_JOBNAME on $HOSTNAME
export SCRATCH="/scr/$PBS_JOBID"
# copy all files from current directory
copy_before_job_source="$PBS_O_WORKDIR"
# just dump scratch into the working directory where the job was submitted
copy_after_job_target="$PBS_O_WORKDIR"
# creates directory under where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"
# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"
# move files to SCRATCH
rsync -avz "$copy_before_job_source"/* "$SCRATCH"
#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT
cd "$SCRATCH"
module load namd/2.12/cuda-8.0
config_file=apoa1.namd
namd2 +idlepoll +p$PBS_NUM_PPN $config_file > $SCRATCH/namd.log
In this job we are using an additional PBS variable, $PBS_NUM_PPN
, this is the number of processors that were requested for the job.
Other than that most most of this script is unchanged from the hello job.
Amber
#!/usr/bin/env bash
#PBS -N my_amber_gpu_job
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:10:00
#PBS -j oe
#PBS -m abe
#PBS -M example@example.com
echo Starting $PBS_JOBNAME on $HOSTNAME
export SCRATCH="/scr/$PBS_JOBID"
# copy all files from current directory
copy_before_job_source="$PBS_O_WORKDIR"
# just dump scratch into the
# working directory where the job was submitted
copy_after_job_target="$PBS_O_WORKDIR"
# creates directory under where you ran qsub for copying files back
mkdir -p "$copy_after_job_target" && echo made directory "$copy_after_job_target"
# creates directory on scratch drive
mkdir -p "$SCRATCH" && echo made directory "$SCRATCH"
rsync -avz "$copy_before_job_source"/* "$SCRATCH"
#copy files on exit or interrupt
trap "echo 'copying files'; rsync -avz $SCRATCH/ $copy_after_job_target/" EXIT
cd "$SCRATCH"
module load amber/18
# -O is used to overwrite existing output files
pmemd.cuda -O
if you want to use gpus, make sure you use the command pmemd.cuda
instead of pmemd
. Also, notice how we do not supply the names of the input or output files to pmemd.cuda. You can get away with this if you want to use the default file names. If you want to use more descriptive file names, use pmemd --help
to get a list of which flags are used for each file.
Python & Tensorflow
Getting Started
For using python with Tensorflow, we have provided a few basic anaconda environments. To activate one of our python 3 environments, use module load anaconda/3
to add anaconda to your path. Then use conda info --envs
to see what is available. In this example we will use source activate tf-py3-gpu
.
#!/usr/bin/env bash
#PBS -N hello
#PBS -q dept_gpu
#PBS -l nodes=1:ppn=1:gpus=1,walltime=00:05:00
module load anaconda/3
# The tensorflow gpu anaconda environment will automatically
# load the CUDA/9.0 module
source activate tf-py3-gpu
# Use the tensorflow library to say 'Hello'
python -c \
'import tensorflow as tf
print(tf.Session().run(tf.constant("Hello")))'
Note: python -c
will use an argument as python code instead of a python file, and we can use the ‘\‘ to have bash ignore the new line, so the command is interpreted as if it is one line. The second new line is OK, because it is inside of quotes, and part of the python code.
Installing Packages
If you need to install your own packages, for example a specific version of numpy, you should create or clone a conda environment:
module load anaconda/3
conda create -n my_env python=3.6 anaconda
# You could replace the above line with the following
# if you wanted a copy of an environment that you already use
# conda create -n my_env_name --clone tf-py3-gpu
source activate my_env_name
conda install numpy=1.13
You can list any number of packages with or without version numbers after conda install
. If conda install can not find the package you need try to search for it on anaconda cloud. Usually the package you need exists, but you need to specify a channel, for example rdkit can be installed with conda install -c rdkit rdkit
Use pip install --user numpy==1.13
to accomplish the same install of numpy, but this will install for all of your environments (usually the anaconda environment will override your packages), if you are in an anaconda environment you can omit the --user
argument to install into the environment. The problem with only using pip install --user
is there’s not a simple way to run python code with conflicting dependencies.