Slurm Fairshare

Fairshare

In a perfect world, we would provide unlimited compute resources, and every job would start instantly. In reality, demand almost always exceeds our capacity. Fairshare is the primary mechanism Slurm employs to manage this scarcity, ensuring equitable access across the user base. Rather than just being a 'first-in, first-out' queue, Fairshare dynamically adjusts job priority based on recent consumption. This means that groups that have used more resources will have lower priority for waiting jobs than groups that have used less.

Graphic showing the relationship between job priority and fairshare utilization

Fairshare is governed by your group’s target share of cluster resources, which is based on your group's size compared to the number of recently active MSI users. As a group approaches 100% of this target, the priority boost provided by Fairshare diminishes to zero. However, groups may exceed their allocation if the cluster is under-utilized or if their workloads are eligible for backfilling—a scheduling optimization that utilizes idle resources to run shorter, smaller jobs without delaying higher-priority tasks.

The consumption of cluster resources is tracked through a process called Billing

  • Billing Rates: Each resource type, cores, memory, and GPUs, is assigned a specific billing rate. Your group’s Fairshare account is billed per minute for each resource allocated to an active job. Newer hardware is billed at a higher rate.
  • Calculating the Utilization: Slurm aggregates these billing units into a Raw Usage total. This value is then normalized against the total usage of all groups on the cluster. By comparing this normalized fraction against your group’s assigned target share, Slurm determines your utilization for the priority calculation.
  • Right Sizing Jobs: It is important to only request resources your job needs in order to limit the fairshare utilization cost of your jobs.
  • Decay: The aggregate bill for resources decays with a half life of 7 days. Slurm will decay Raw Usage every 5-10 minutes by an amount that will have this value halved after 7 days if no more use occurs. If you have a utilization of 200%, it will take 7 days for that utilization to fall to 100%.

 

Use of Dedicated Computing nodes does not increase fairshare utilization, these resources are billed with a zero multiplier.

Discover Advanced Computing and Data Solutions at MSI

Our Services
Was this page helpful?
If you have a question about MSI services, please submit a ticket through our Help Desk