SLURM Partitions

A partition can be thought of as a group of nodes/resources divided into possibly overlapping sets. Each partition can be considered as a job queue, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc. Priority-ordered jobs are allocated nodes within a partition until the resources (nodes, processors, memory, etc.) within that partition are exhausted.  There are public and privation partitions on the UH ITS HPC. Private partitions are available to users that have purchased HPC resources.

The UH ITS HPC currently has eight public partitions available: community.q, exclusive.q, lm.q,  sb.q, kill.q, kill.gpu.q, kill.lm.q and htc.q.  Descriptions of these partitions can be founds below.

Community Partition

The community.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request = 20
  • Core Per Node = 20

This partition currently contains 176 standard compute node with names that start with “prod2”.  This partition is the default for jobs submitted to the SLURM scheduler.  The nodes in this partition are able to be shared with multiple jobs in order to maximize resource utilization. 

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=community.q

 

 

Kill Partition

The kill.q partition 

  • Wall-Time = 72 hours (4320 minutes)

  • Max Node Request = 20

  • Core Per Node = 20 OR 24

This partition currently contains 58 standard compute node with names that start with “prod2” that have 24 cores and 128GB of memory and 33 nodes with 20 cores and 256GB of memory. The nodes in this partition overlap with condo node owner nodes and nodes that could be used for purchased time users.  Be aware that jobs submitted here can be preempted by users with higher priority (i.e. node owners and purchased compute time users)

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=kill.q

Large Memory Partition

The lm.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request = 1
  • Core Per Node = 40

This partition contains the 6 large memory nodes with names that start with “lm2”.   The nodes in this partition are intended for jobs that require a large amount of memory.  Jobs that do not need more than 128GB of memory should utilize the community.q, kill.q or exclusive.q partitions.

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=lm.q

 

Sandbox Partition

The sb.q partition 

  • Wall-Time = 60 minutes
  • Max Node Request = 2
  • Core Per Node = 20

This is the sandbox partition used for compiling or quick tests in a compute environment-NOT FOR COMPUTE. There will be 1 or 2 standard compute node in the partition at any time.  

To be able to utilize the partition properly the user must add the following to their SLURM script or srun command and remove any other -p or –partition parameters.

#SBATCH --partition=sb.q

Exclusive Partition

The exclusive.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request = 20
  • Core Per Node = 20

This partition currently contains 176 standard compute node with names that start with “prod2” and overlaps the same resources as the community.q partition.  Jobs should be submitted to this partition if they are of a nature that sharing the node(s) would prevent the job from running properly.  Examples of this are MPI jobs as they cannot share resources with other MPI jobs without using up all the connections.  Another example is where all the memory of the node is required for a job.  

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=exclusive.q

 

 

Kill GPU Partition

The kill.gpu.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request = 1
  • Core Per Node = 20
  • GPU

This partition currently contains 1 GPU compute node with two NVIDIA K40 GPUs with names that start with “gpu” and overlaps a node owned GPU resource.  Jobs should be submitted to this partition if they are of a nature that can utilize a GPU and can checkpoint/restart since jobs are not guaranteed to finish in this partition if a node owner workload preempts the job.

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters. The –gres=gpu:2 will use two NVIDIA K40 GPUs.

#SBATCH --partition=kill.gpu.q
#SBATCH --gres=gpu:2 

 

Kill LM Partition

The kill.lm.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request = 1
  • Core Per Node = 40

This partition contains the 1 large memory nodes with names that start with “lm2”.   The node in this partition is intended for jobs that require a large amount of memory.  Jobs that do not need more than 128GB of memory should utilize the community.q, kill.q or exclusive.q partitions.

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=kill.lm.q

HTC Partition

The htc.q partition 

  • Wall-Time = 72 hours (4320 minutes)
  • Max Node Request =1
  • Core Per Node = 20

This partition currently contains 276 standard compute node with names that start with “prod2”.  This parition is mean for single core high-throughput computation, ideally the jobs execute in a handful of hours.  Jobs submitted to this partition can be preempted by any other partitions job on the system if it requests that resources so it is important that workloads submitted here can checkpoint/restart themselves.

To be able to utilize the partition properly the user must add the following to their SLURM script and remove any other -p or –partition parameters.

#SBATCH --partition=htc.q

Back to Top