Batch High Performance Computing

How to run your research computing jobs at MSI

The large high performance computing resources at MSI are often in high demand, with over 900 research groups and four to five thousand unique researcher logins competing for time. There are not enough resources to run every job immediately when it gets submitted, so a job scheduling program is required to keep things efficient and also allocate time fairly.

If you have not already connected your personal computer to a shell on the MSI cluster, review the guide on Connecting to HPC Resources first, and then come back here to learn how to run your programs on these systems.

SLURM is the name of the job scheduler that MSI uses to control the flow of research jobs flowing through the compute resources. To place jobs in the queue, a user must submit their jobs to the scheduler using a “batch script” that describes the resources necessary to complete the job.

This guide on Job Submission and Scheduling provides instructions on how to write the small “batch scripts” required to describe your job. Another guide that will also be useful for submitting batch jobs is this guide that helps you choose which SLURM partition to insert into your batch script. Once a script has been submitted, the scheduler will either begin running the job or place it in the queue to begin once the necessary compute resources become available.

Note that MSI also supports ways to run applications on the HPC resources in “Interactive” mode. See the guide for Connecting to Interactive HPC Resources to learn how to submit jobs via an interactive interface. Jobs submitted via the interactive methods are all scheduled behind the scenes by the same SLURM job scheduler that controls the batch jobs.

ServiceService DescriptionInternal Cost (1yr)Internal Cost (5yrs)External Cost
Service UnitsService Units (SU) are the way that MSI tracks CPU usage on our High Performance Computing resources.  A minimum of 200 SUs may be purchased at a time. 1 SU is equivalent to 345.5 CPU-hours, 2340 GB-hours of memory, or 3.91 GPU-hours on a V100 GPU.  $3.07/SU
Option 1CPU: 128 cores AMD. Memory: 512 GB. GPUs: None.$2632$12269$3.07/SU
Option 2CPU: 128 cores AMD. Memory: 2048 GB. GPUs: None.$4082$19519$3.07/SU
Option 3CPU: 64 cores AMD. Memory: 512 GB. GPUs: 4 x NVIDIA A100.$9321$45715$3.07/SU
Option 4CPU: 128 cores AMD. Memory: 1024 GB. GPUs: 8 x NVIDIA A100.$16320$80713$3.07/SU

