The University of Hawaii acquired a Cray CS300 compute in early November of 2014. It was housed in the UH ITS Data Center and currently has 269 standard compute nodes, 6 large memory nodes, 1 GPU node and a 600TB Lustre parallel file system. The system utilizes the SLURM job scheduler to manage user jobs, allocations and fair-share use. The system is managed and maintained by the UH ITS Cyberinfrastructure division in partnership with the Cyberinfrastructure Faculty Advisory Committee.
This quick start guide is designed to provide the basic level of information a user would need to access and use the UH HPC resource. For additional help email us at email@example.com.
Logging in to the UH HPC Cluster
The UH HPC system as two login/head nodes. These login/head nodes are to be used for file transfer and for launching jobs on the cluster using the SLURM job scheduler. DO NOT execute computation on the login nodes! This will potentially make them inaccessible for other users and can bring the cluster down for EVERYONE which will upset other users and make you unpopular.
To access the login/head nodes you must use ssh (Secure Shell) to get a terminal on the login/head node with your UH username and password as your credentials. The command should look like this
OR if this server is down or is not working, use:
you will be asked to input your UH password to authenticate yourself. NOTE that if you input invalid login credentials more than three times your IP will be banned for 10 minutes on that machine.
Transfer Files To and From the UH HPC Cluster
Moving data to and from the cluster can be done two ways.
- The first is using Globus which is the fastest and most robust way to transfer data to and from the cluster – it does require you sign up for a Globus account first and install the Globus connect software on your device. Our globus endpoints on the cluster are hawaii#UH-HPC1 and hawaii#UH-HPC2 and you will be asked to username and password when you access them – which is your UH username and password.
- The second is sftp. We recommend Filezilla as it is cross platfrom and mostly straight forward to use. You can use either of the login node uhhpc1.its.hawaii.edu or uhhpc2.its.hawaii.edu as the “Host” username and password are your UH username and password and the “Port” is 22.
Globus is recommended as this is rather “fire and forget” and will retry transfers if the connections are broken as well as email you when transfers are complete. More information can be found here (Globus Quick Start Guide)
The UH HPC has two main filesystems: the login/head nodes filesystem, that has home directories, and the Lustre scratch filesystem that is attached to all the compute nodes and is 600TB in size. The home directory space is very limited at < 2TB in size for all 250+ UH users currently so all data should be stored on the Lustre filesystem. To facilitate this we have created an “apps” and a “lus” folder for you. The “apps” folder is meant for storing scripts, software and config files – NO DATA should be stored here. The “lus” folder is meant for your staged input data and results – files should only stored there for less than 90 days- after 90 days our automated data scrub scripts will remove files that have been there for 90 days – this is subject to change. For more details on our file system policies please read our HPC policies.
Using SLURM – an example slurm script
Tools that you want to run are embedded in a command script and the script is submitted to the job control system using an appropriate SLURM command (additional information about sbatch can be found here http://www.schedmd.com/slurmdocs/sbatch.html). For a simple example that just prints the hostname of a compute host to both standard out and standard err, create a file called
example.slurm with the following content:
#!/bin/bash #SBATCH -J hello.R # Name for your job #SBATCH -n 1 # Number of tasks when using MPI. Default is 1 #SBATCH -c 1 # Number of cores requested, Default is 1 (total cores requested = tasks x cores) #SBATCH -N 1 # Number of nodes to spread cores across - default is 1 - if you are not using MPI this should likely be 1 #SBATCH --mem-per-cpu #the amount of memory per core to request in MB, Default is 3200 MB so be sure to include this (Max for community.q, sb.q and exclusive.q are 6400) #SBATCH -t 5 # Runtime in minutes. Default is 10 minutes. The Maximum runtime currently is 72 hours, 4320 minutes - requests over that time will not run #SBATCH -p community.q# Partition to submit to the standard compute node partition(community.q) or the large memory node partition(lm.q) #SBATCH -o example.out # Standard out goes to this file #SBATCH -e example.err # Standard err goes to this file #SBATCH --mail-user firstname.lastname@example.org # this is the email you wish to be notified at #SBATCH --mail-type ALL # this specifies what events you should get an email about ALL will alert you of job beginning, completion, failure etc source ~/.bash_profile #if you want to use modules or need environment variables use this if your shell is bash to load those module load somemodule #if you want to load a module use this and the above common with the module name you wanted loaded. hostname# or myscript-or-command put the commands you want to run
The slurm script you wish to use and the sbatch slurm file should be located in the same location you want your output and should be launched from that location. That location on the UH cluster should be located somewhere within your ~/lus/ directory (that is the Lustre file system directory within your home directory that was setup for you and provides access to the 600TB of scratch space on the cluster). If you run jobs from outside of the lus directory on the normal file system you could fill up the usable disk space since it is limited which would bring the cluster down for everyone using it. Also, the Lustre file system has been optimized to provide a high number of I/Os for superior read and write performance as it is a parallel file system.
MPI Specific Example Slurm batch script:
#!/bin/bash #SBATCH --time=1:00:00 # walltime, abbreviated by -t #SBATCH --nodes=2 # number of cluster nodes, abbreviated by -N #SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N) #SBATCH --ntasks=40 # number of MPI tasks, abbreviated by -n # additional information for allocated clusters #SBATCH --partition=community.q# partition, abbreviated by –p # load appropriate modules if necessary module load prod/somesoftware # set environment variables REQUIRED for MPI export OMP_NUM_THREADS=1 export I_MPI_FABRICS=tmi export I_MPI_PMI_LIBRARY=/opt/local/slurm/default/lib64/libpmi.so # run the program NOTE the command "mpirun" is required along with "-n" the number of tasks mpirun –n 40 my_mpi_program > my_program.out
Submit a job script to SLURM
When command scripts are submitted, SLURM looks at the resources you’ve requested and waits until an acceptable compute node or nodes are available on which to run it. Once the resources are available, it runs the script as a background process (i.e. you don’t need to keep your terminal open while it is running), returning the output and error streams to the locations designated by the script.
You can monitor the progress of your job using the
squeue -j JOBID command, where JOBID is the ID returned by SLURM when you submit the script. The output of this command will indicate if your job is PENDING, RUNNING, COMPLETED, FAILED, etc. If the job is completed, you can get the output from the file specified by the
-o option. If there are errors, they should appear in the file specified by the
To run an interactive SLURM Session
srun -I -p all.q -N 1 -t 1 --pty -t 0-00:05 /bin/bash
This will try and run an interactive session in the all.q partition for 5 minutes on one node with 1 core. It runs the bash shell. For this to work there has to be resources immediately available. This is useful if you need to play around in the environment prior to starting a large job. Additional information about srun can be found here http://www.schedmd.com/slurmdocs/srun.html
To run an interactive session with X11 Forwarding enabled to be able to use GUI applications
When sshing to the login/head nodes use
ssh -Y YourUHusername@uhhpc1.its.hawaii.edu
ssh -Y YourUHusername@uhhpc2.its.hawaii.edu
Then run the interactive x11 slurm session with the following:
srun.x11 -N 1 -n 1 -t 0-00:05 -p lm.q
This will launch you into a bash session with X11 forwarding enabled on the lm.q partition on a node. From there you can launch things like web browsers and other gui tools. For example to launch the Firefox web browser run:
The “&” tell the command line to run the application in the background so you can continue to use the command line while it is running.
|Partition Name||Max WallTime||Max # of Nodes you can request||Cores per Node||Description|
|community.q||4320m (72hrs)||20||20||This partition currently contains 178 standard compute node with names that start with “prod2”|
|exclusive.q||4320m (72hrs)||20||20||This partition currently contains 178 standard compute node with names that start with “prod2”|
|kill.q||4320m (72hrs)||20||20, 24||This partition currently contains 91 standard compute node with names that start with “prod2”. Be aware that jobs submitted here can be preempted by users with higher priority (i.e. node owners and purchase compute time users)|
|lm.q||4320m (72hrs)||3||40||This partition contains the 6 large memory nodes with names that start with “lm2”|
|sb.q||60m||1||20||This is the sandbox partition used for compiling or quick tests in a compute environment-NOT FOR COMPUTE. There will be 1 or 2 standard compute node in the partition at any time.|
Note that if you submit a job to SLURM that requests more wall-time or nodes than the maximum it will not run unless an admin intervenes on your behalf. Jobs that require more nodes or longer wall-times can be requested to the CI Faculty Advisory Committee.
Why a community.q, exclusive.q and kill.q partition?
The community.q partition is available to everyone as is the kill.q partition. Jobs submitted to the kill.q partition can potentially be preempted by users that are paying for guaranteed compute access. So jobs submitted to kill.q should do their own checkpointing or be short so the likelihood of their completion can go up. You might ask why would I submit to the kill.q partition? Well there is potential for resources to be available there when the community.q partition is full. Just keep in mind that running in the kill.q can be gamble as to whether your job will run to completion soon or be preempted by a user with guaranteed access.
The exclusive.q is for jobs that cannot share nodes with other jobs. MPI jobs are an example of this use case since an MPI job will try to be greedy and grab all resource instances on the node leaving a 2nd MPI job to fail. This is also useful for users that have large memory requirements that will fail if too little memory can be accessed – very large assemblies are an example ( you must also specify the amount of –mem-per-cpu for the large memory use case otherwise SLURM will cancel the job when it goes over the amount of memory you requested – as it think the job is behaving outside of the parameters you descibed)
Certain software application have been installed on the cluster partitions and are available to use. To view the current list of software you need to enter an interactive session and then type
This will show all the software modules available on for the nodes on that partition. To use the software you need to load the module. For example we can load a module called prod/beastv1.18.1 (beast on a prod compute node – if we wanted it for a large memory node we would use lm/beastv1.8.1 -these are listed by the module avail command) by typing:
module load prod/beastv1.8.1
After that the beast software is available in my current path and can be called and executed.
If you are using software installed on the nodes in these modules you will need to put the “module load” command into your slum script before the line where you call the application and you will need to load your environment variables by sourcing your .bash_profile “source ~/.bash_profile”.
Other Useful SLURM Commands
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 106 kill.q slurm-jo seanbc R 0:04 1 prod2-0001
Also the use of "squeue -u username" will show all your running or queued jobs
Get Job Details:
$ scontrol show job 106
JobId=106 Name=slurm-job.sh UserId=seanbc(1001) GroupId=seanbc(1001) Priority=4294901717 Account=(null) QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=00:00:07 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2013-01-26T12:55:02 EligibleTime=2013-01-26T12:55:02 StartTime=2013-01-26T12:55:02 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=kill.q AllocNode:Sid=atom-head1:3526 ReqNodeList=(null) ExcNodeList=(null) NodeList=prod2-0001 BatchHost=prod2-0001 NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/home/seanbc/lus/slurm-job.sh WorkDir=/home/seanbc/lus
Kill a Job:
$ scancel 106 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) Users can only kill their own jobs
Hold a Job:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 139 kill.q simple seanbc PD 0:00 1 (Dependency) 138 kill.q simple seanbc R 0:16 1 prod2-0001 $ scontrol hold 139 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 139 kill.q simple seanbc PD 0:00 1 (JobHeldUser) 138 kill.q simple seanbc R 0:32 1 prod2-0001
Release a Job:
$ scontrol release 139
$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 139 kill.q simple seanbc PD 0:00 1 (Dependency) 138 kill.q simple seanbc R 0:46 1 prod2-0001
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lm.q up 3-00:00:00 1 mix lm2-0002 lm.q up 3-00:00:00 5 alloc lm2-[0001,0003-0006] sb.q up 10:00 2 idle sandbox-[0001-0002] community.q up 3-00:00:00 155 alloc prod2-[0001-0139,0141-0156] community.q up 3-00:00:00 1 idle prod2-0140 kill.q up 3-00:00:00 19 alloc prod2-[0157-0175] kill.q up 3-00:00:00 1 idle prod2-0176
List Partition Information:
$scontrol show partition community.q PartitionName=community.q AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO DefaultTime=00:10:00 DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=20 MaxTime=3-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=prod2-[0001-0156] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=REQUEUE State=UP TotalCPUs=3120 TotalNodes=156 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerCPU=65500
List Node Information
$scontrol show node prod2-0001 NodeName=prod2-0001 Arch=x86_64 CoresPerSocket=10 CPUAlloc=20 CPUErr=0 CPUTot=20 CPULoad=20.00 Features=(null) Gres=(null) NodeAddr=prod2-0001 NodeHostName=prod2-0001 Version=14.03 OS=Linux RealMemory=129000 AllocMem=0 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2015-04-11T01:16:11 SlurmdStartTime=2015-04-11T09:10:38 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s