SLURM MPI Jobs

The Message Passing Interface (MPI) system was designed to enable parallel programming by communication on distributed-memory machines. MPI has become a standard for multiple-processor programming of code that runs on a variety of machines an the UH ITS HPC Cluster support MPI.  MPI is what allows programs to span multiple compute nodes on the cluster.  If you software does not support MPI then if will not be able to utilize multiple compute nodes.

Below is an example SLURM batch file for submitting a MPI job to the UH ITS HPC Cluster.

my_mpi_job.slurm

#!/bin/bash
#SBATCH --time=1:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=2     # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH --ntasks=40   # number of MPI tasks, abbreviated by -n
# additional information for allocated clusters
#SBATCH --partition=community.q# partition, abbreviated by –p
# load appropriate modules if necessary
source ~/.bash_profile
module load prod/somesoftware
# set environment variables necessary for MPI
export OMP_NUM_THREADS=1
export I_MPI_FABRICS=tmi
export I_MPI_PMI_LIBRARY=/opt/local/slurm/default/lib64/libpmi.so
# run the program
mpirun –n 40 my_mpi_program > my_program.out

The my_mpi_job.slurm file above is requesting 2 compute nodes and 40 tasks (so 20 cores per node).  Note the environment variable set section is required for MPI to work on the cluster.  Also, keep in mind that that the number of tasks you ask for via sbatch with “-n” of “–ntasks” should equal the value “-n” in your “mpirun” command otherwise you will not be using all of the resource you have requested.

Back to Top