HPC Scratch Space Storage Policy
The Lustre scratch storage space on the UH ITS HPC cluster is a shared resource. The scratch space is not backed-up and UH ITS is not responsible for data loss. Users are responsible for backing up their own data. The scratch storage space is only meant for temporary storage of data and intermediate compute products not to exceed 90 days – this time is subject to change based on storage capacity. The scratch space will be scrubbed every week for files that exceed the 90 day time limit as well as scrubs that will be initiated when the file system reaches 80% of capacity. Users are responsible for monitoring their files and transferring data – UH ITS is not responsible for data loss resulting from the automated file scrub. Users will have an “apps” folder in their home directory that is to be used for compiled software applications, scripts and configuration files that should be exempt from the automated purge procedures. The “apps” folder is not meant for staging or storing data related to computation. The “apps” folder size will be monitored to ensure proper usage and storage system stability. Abuse of the file system may result in a user’s account being disabled until issues are resolved.
Login Node Policies And Etiquette
The UH ITS HPC Cluster login nodes have two specific purposes: providing ssh shell access to transfer files to and from the cluster and launch batch and interactive session on the compute nodes. Specifically, Globus , sftp, scp, rsync transfers are allowed along with launching SLURM jobs (batch and interactive) and modifying text files with a text editor- everything else should be run on a compute node. The login nodes are a shared resource and are the only access to the cluster for hundreds of user. Therefore, running other tasks on the login nodes is not allowed and the resulting tasks will be canceled and repeat offenders can have their HPC accounts disabled.
HPC Cluster Maintenance
The UH ITS HPC Cluster will need to undergo regular maintenance to address patching, security and system stability. The first Wednesday of each month that is not a holiday is reserved from 8am-5pm for this maintenance to take place. Although rare, jobs running on the cluster during this maintenance may have to be stop and possibly restarted – users are responsible for being aware of any impacts on their job from a restart.