Slurm Workload Manager is MSI's job scheduler
MSI uses Slurm, a state-of-the-art scheduler for HPC clusters for efficient resource allocation and task management. Renowned for its scalability and industry-standard status, Slurm streamlines job execution, resource allocation, and queue management, fostering greater efficiency in scientific computation.
Details on how to use Slurm can be found here.
What is Slurm?
Slurm is a best-in-class, highly-scalable scheduler for HPC clusters. It allocates resources, provides a framework for executing tasks, and arbitrates contention for resources by managing queues of pending work.
Why did MSI transition to the Slurm scheduler?
Slurm has become an industry standard for scheduling among HPC centers. It’s an open-source scheduler with a plugin framework that allows us to leverage tools developed at other centers. It is capable of stable management of a larger number of jobs than our current scheduler. Finally, it’s architecture opens opportunities to leverage technologies that will be useful for many areas of scientific computation.
How does the transition to Slurm impact my work on MSI systems?
The most obvious adjustment everyone will need to make is to learn a new set of commands for submitting jobs and checking on job status. If you have written scripts that depend on the job scheduler, they will need to be modified to match the syntax used in Slurm. This is also true of some software that MSI maintains.
When you run jobs using Slurm, there will be no SUs deducted from your SU allocation. Group job limits will change over the next couple months as we migrate nodes from the other cluster. ESO customers have received an email on October 15th containing important information regarding the transition of paid SUs and SU accounting for ESO customers.
Resources
MSI has put together resources for users to help groups get started using Slurm. A recorded tutorial session on using Slurm is also now available. Please see the list of links below for more information on various topics related to Slurm, and how to get started using Slurm: