Extended June MSI Maintenance

Extended June MSI Maintenance – June 5-7, 2024

Summary

MSI reserves the first Wednesdays each month for maintenance tasks on its various computing, storage, and infrastructure systems. The next maintenance period, however, will be extended for 1-2 extra days in order for more upgrades to be completed to make MSI a more reliable and more capable resource for U of M researchers. 

You will not be able to use the HPC clusters from 5 a.m. Wednesday June 5, until at least the evening of Thursday, June 6. 

MSI News
MSI June 
Maintenance
      
 

Wed
Daytime

Wed
Evening

Thu
Daytime

Thu
Evening

Thu
Daytime

Thu
Evening

 

June 5

June 5

June 6

June 6

June 7

June 7

Service

      
Storage - Panasas

Down

Restoring

Up

Up

Up

Up

Storage - VAST

Up

Up

Up

Up

Up

Up

Tier 2 Storage

Up

Up

Up

Up

Up

Up

Stratus

Up

Up

Up

Up

Up

Up

LM&P Pipelines

Down

Down

Down

Restoring

Up

Up

HBCD Pipelines

Down

Down

Down

Restoring

Up

Up

Agate Cluster

Down

Down

Down

Restoring

Up

Up

Mangi Nodes

Down

Down

Down

Down

Down

Restoring

Mesabi Cluster

Down

Down

Retired

Retired

Retired

Retired

Detailed information on the outages

Power Infrastructure

The most time-consuming effort for the next maintenance period will be the replacement of our three data center Uninterruptable Power Supplies (UPS), which have reached their end of life. The planning to install their replacements started over a year ago, and construction started in early 2024. Assuming the final testing of the new systems goes well during the last week of May, the switchover to the three new systems will occur sequentially on Wednesday, June 5; Thursday, June 6; and Friday, June 7.

We expect core MSI storage systems to be fully available again by the end of day Wednesday, June 5. However, all of our HPC clusters will be unavailable until at least the end of Thursday, see below. 

Agate maintenance

The main Agate cluster, and associated services such as Open OnDemand, will return to service after the second UPS switchover, by the evening of Thursday, June 6.

Mesabi retirement

As previously announced, our nearly decade-old cluster Mesabi will be retired from service. More information on the SLURM partitions to be retired can be found at the
Mesabi Retirement webpage.

If you have only done your computing on Mesabi, and/or ignored messages about the new operating system on Agate, here is a
guide assisting you in the transition from CentoOS 7 to the Rocky8 operating system

Mangi nodes transition

After the final UPS switchover on Friday, June 7, the relatively newer expansion compute nodes attached to Mesabi (known as “Mangi”) will be reassociated with the Agate cluster, while also getting the same Rocky8 operating system loaded so all of the nodes are aligned. This operation is targeted to be complete by Friday evening. 

Movement of all Notebooks applications to Open OnDemand

During this maintenance, the stand-alone Jupyter Notebooks service will be taken down, as previously announced. Tens of thousands of Notebooks sessions have been run on MSI’s deployed Open OnDemand service, which is more capable, making the original service redundant.

Expansion of the Agate cluster

Although not in the plans for the June maintenance, all of the activity being done will help lay the groundwork for bringing a supplemental set of hardware to expand the Agate cluster later in 2024. Delivery, provisioning, setup, and testing of this new hardware is all coming at future dates. Keep an eye on news from MSI about the new hardware.

Discover Advanced Computing and Data Solutions at MSI

Our Services