Using R on the UH HPC

This short tutorial focuses on running R script files using the UH cluster  which is managed by a batch job control system called SLURM. You should be able to run R from a linux command line as well as submit a job to SLURM that will run R after going through this short introduction. NOTE that R is only installed on the compute nodes and not on the login nodes.  Also, due to how R needs to be installed it is available in the environment so there is no module to load.

Running R from the Linux command line

An example R file hello.R 

sayHello <- function(){ 
  print('hello') 
sayHello()

To run an R file from the command line there a two methods

  1. R CMD BATCH hello.R
    1. Note that when using R CMD BATCH hello.R that instead of redirecting output to standard out and displaying on the terminal a new file called hello.Rout will be created.
  2. Rscript hello.R

Using SLURM and R

Tools that you want to run are embedded in a command script and the script is submitted to the job control system using an appropriate SLURM command.  For a simple example that just prints the hostname of a compute host to both standard out and standard err, create a file called example.slurm with the following content:

#!/bin/bash  
#SBATCH -J hello.R # Name for your job
#SBATCH -n 1 # Number of task
#SBATCH -c 1 # Number of cores
#SBATCH -N 1 # Ensure that all cores are on one machine
#SBATCH -t 5 # Runtime in minutes
#SBATCH -p community.q# Partition to submit to the standard compute node partition in this example
#SBATCH -o example.out # Standard out goes to this file
#SBATCH -e example.err # Standard err goes to this file

#SBATCH –mail-user you@hawaii.edu # this is the email you wish to be notified at

#SBATCH –mail-type ALL # this specifies what events you should get an email about ALL will alert you of job beginning, completion, failure etc
Rscript hello.R  # or we could also put R CMD BATCH hello.R if we do the example.out and example.err files will not be created 

The R script you wish to use and the sbatch slurm file should be located in the same location and should be located in your ~/lus/ directory (that is the Lustre file system directory within your home directory that was setup for you and provide access to the 600TB of scratch space on the cluster).  If you run jobs from outside of the lus directory on the normal file system you could fill up the usable disk space since it is limited which would bring the cluster down for everyone using it.  Also, the Lustre file system has been optimized to provide a high number IOs for superior read and write performance as it is a parallel file system.

Submit this job script to SLURM

sbatch example.slurm

When command scripts are submitted, SLURM looks at the resources you’ve requested and waits until an acceptable compute node is available on which to run it. Once the resources are available, it runs the script as a background process (i.e. you don’t need to keep your terminal open while it is running), returning the output and error streams to the locations designated by the script.

You can monitor the progress of your job using the squeue -j JOBID command, where JOBID is the ID returned by SLURM when you submit the script. The output of this command will indicate if your job is PENDING, RUNNING, COMPLETED, FAILED, etc. If the job is completed, you can get the output from the file specified by the -o option. If there are errors, the should appear in the file specified by the -e option.

To run an interactive SLURM Session

srun -p community.q –pty -t 0-00:05 /bin/bash

This will try and run an interactive session in the community.q partition for 5 minutes.  It run the bash shell.  For this to work there has to be resources immediately available.  This is useful if you need to play around in the environment prior to starting a large job or installing R packages.

Getting Your Output

Depending on which way we ran the hello.R script will determine where the output is.  

If Rscript was use the results will be in example.out and appear as below for this example:

[1] “hello”

if R CMD BATCH was use the results will be in hello.Rout and appear as below for this example:

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"

Copyright (C) 2014 The R Foundation for Statistical Computing

Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type ‘license()’ or ‘licence()’ for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type ‘contributors()’ for more information and

‘citation()’ on how to cite R or R packages in publications.

Type ‘demo()’ for some demos, ‘help()’ for on-line help, or

‘help.start()’ for an HTML browser interface to help.

Type ‘q()’ to quit R.

> #!/usr/bin/env Rscript

> sayHello <- function(){

+    print(‘hello’)

+ }

> sayHello()

[1] “hello”

> proc.time()

   user  system elapsed 

  0.148   0.030   0.270 

Installing R Packages

With the wide variety and versions of R packages available we have decided not to install any packages centrally.  Instead you have the ability to install the packages that you need to your home directory.  To do this simply start an interactive session, run “R” and install the packages as normal.  For example below is how to install ggplot2 (note the blue text is where it is installed too on my home directory and will therefore be available for all future R sessions):

[seanbc@login-0002]srun -p community.q -N 1 -t 1 –pty -t 0-00:05 /bin/bash

[seanbc@prod-0002]$ R

R version 3.1.2 (2014-10-31) — “Pumpkin Helmet”

Copyright (C) 2014 The R Foundation for Statistical Computing

Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type ‘license()’ or ‘licence()’ for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type ‘contributors()’ for more information and

‘citation()’ on how to cite R or R packages in publications.

Type ‘demo()’ for some demos, ‘help()’ for on-line help, or

‘help.start()’ for an HTML browser interface to help.

Type ‘q()’ to quit R.

[Previously saved workspace restored]

> install.packages(“ggplot2”)

Installing package into ‘/home/seanbc/R/x86_64-redhat-linux-gnu-library/3.1’

(as ‘lib’ is unspecified)

— Please select a CRAN mirror for use in this session —

CRAN mirror 

  1: 0-Cloud                        2: Algeria                    

  3: Argentina (La Plata)           4: Australia (Canberra)       

.

.

.

Back to Top