MSI Project Migration to VAST

This documentation and guide is specifically for MSI's migration of PI project storage from our older file system to the new VAST storage system, which is occurring during the latter half of 2025 and possibly early 2026. Each PI and project admin will be informed directly ahead of their project's migration, one week before, and on the day it takes place. At that time, you will need to follow the guide below for actions you need to take to finish setting up your new environment.

There are some tasks listed further down, such as re-linking data to Galaxy and preparing for Conda environment rebuilding, that can be started before the project migration takes place.


If your project's primary storage has recently been scheduled to be migrated from MSI’s Panasas file system to the newer VAST storage, you will need to read ahead to see what positive actions you may need to take in order to be able to make your research jobs work as you expect. Also see Scratch and Project Directory Storage Speeds.

If you have any jobs waiting in the SLURM queues that were submitted before the migration took place, you should pay special attention to the final item in the action list below (actions that all projects need to take) regarding re-submitting these jobs.

If you use Galaxy or Conda, you will almost certainly need to read the information below about rebuilding your environment(s). In rarer cases, if you are a researcher who builds their own software – or uses someone else’s software that was located in your project directory that was moved - then it is possible that you will need to rebuild your software package.

Actions that all projects need to take:

Conditional actions if your project use one of these research services or softwares:

General Reminder : Store irreplaceable data in disaster_recovery

Since you will need to review your project folders as part of your migration outcomes, now is a perfect time to also carefully consider if you have data or results that are of utmost importance to your Project. Ensure that these files are stored in your projects shared/diaster_recovery folder to ensure that MSI's disaster recovery contingencies help you protect that data. 

As a reminder: the disaster_recovery folder is not a replacement for having secure and redundant backups of your data. For more information see our page on Data Retention and Protection.

 

Project Path and MSI Project Variables

Where possible, please leverage the MSI Project Variables going forward. Visit MSI Aliases for reference on these environment variables.

If you have an application or tool chain that cannot use shell or environment variables, you can look up the path to your project folder with  realpath -m $MSIPROJECT

You can switch between projects (which will automatically update all relevant variables) with

newgrp - PROJECT

If you are a member of multiple projects you may discover the directory path using commands:

1. list out projects your user account is a member of

id -Gn

2. select project and run command

path_to_project <project-name>

Globus

Visit Guest Collections Page to see collections owned by the users account

The process for updating a Guest collection begins with 

  1. Identify the full path of your project's folder with  realpath -m $MSIPROJECT
  2. Update the Project path as needed using the following example

 

OLD PATH /home/project_name/...

NEW PATH output from  realpath -m $MSIPROJECT 

This page can be used to create a new Guest collection

An example collection that points to /home/msistaff/shared would become  /project/standard/msistaff/shared

Symlink Impacts

MSI project migration tools will auto-update absolute symbolic links that point to paths originally belonging to your primary project e.g /home/project_name . Relative links will NOT be updated

Links that mention other projects than the primary project may not be currently broken, but over the next 6-18 months, they will break as the other project will also be migrated and the link will no longer work.

To find all symlinks in your project folder that you have access to, you can use

  • find $MSIPROJECT -type l

To find all BROKEN symlinks you can use

  • find $MSIPROJECT -xtype l

 

There is no one-size-fits-all approach for symlinks that are broken. Also some may 'appear' to be broken that were created from inside of a singularity or apptainer container (as the link outside of the container will be broken but function normally inside of a container).  

You can update symlinks manually as needed with the following command

  • ln -vfns RELATIVE_PATH_TO_TARGET PATH_OF_LINK_TO_MAKE
  • E.g. ln -vfns ../shared/bin $MSIPROJECT/shared/myapp/bin

 

Symlink Performance Notes

We have become aware of a large number of users and projects that, even after being migrated, still have links that reference legacy paths or are just fully broken.  These links end up introducing major performance impacts on workflows that end up interacting with them by accident or by design.

To search for these legacy links run the following commands:

In your home directory

while read -r -d $'\0' link ; do
 target="$(realpath -m "$link")"
 if [[ "$target" =~ ^/panfs/ ]]; then
   echo "Link to legacy storage: $link -> $target"
 elif [[ "$target" =~ ^/home/ ]] && [[ ! "$HOME" =~ ^/home/ ]]; then
   echo "Link to legacy storage: $link -> $target"
 fi
done < <(find ~/ -type l -print0)

 In your project directory

while read -r -d $'\0' link ; do
 target="$(realpath -m "$link")"
   if [[ "$target" =~ ^/panfs/ ]]; then
   echo "Link to legacy storage: $link -> $target"
 elif [[ "$target" =~ ^/home/ ]] && [[ ! "$HOME" =~ ^/home/ ]]; then
   echo "Link to legacy storage: $link -> $target"
 fi
done < <(find "$MSIPROJECT" -type l -print0)

 

Conda and Miniconda(3) environments

Conda environments in the default .conda top level folder were not migrated as the migration is known to break them.  If you have conda environments in non-standard paths, they are very likely broken and also will need repair.  NOTE: if you have conda environments in your home directory that relied on resources in the /home/PROJECT folder, those will need to be rebuilt as well. 

  1. Load the miniforge module to make conda-like commands available

    1. module load miniforge

  2. list out current conda environments and make note of any that are stored in the project space (if any)

  3. Export a yaml file that contains the package list for the environment and save a copy into the shared directory

    1. conda env export --no-builds -n environment_name > $SHARED/environment_name.yaml

  4. Edit the yaml and update:

    1. prefix If the environment will be installed to another location than the default (user home directory), update the prefix to the path of interest

    2. e.g prefix=$SHARED/environment_name

    3. set the name field the name of the new environment, we would recommend using a different name OR deleting the environment before creating its replacement

    4. Existing Conda environments can be renamed with the command conda rename -n current_name new_name

  5. Build out the environment

    1. conda env create -f $SHARED/environment_name.yaml

  6. Delete the now unused environment

    1. conda env remove -n environment_name

Galaxy

Data that is actively used on galaxy.msi.umn.edu can be relinked by sending an email to the help desk ([email protected]) and requesting the service. After the help desk receives the request, it can be added to the queue and processed. Once the data is linked, then the data may be accessed again.

An example request would look like

Hello MSI help,

I would like to request for the following paths to be linked to Galaxy

project_name/shared/...dirA

project_name/shared/...dirB

NOTE: This process can take place before the migration takes place and we encourage projects to reach out as early as they can to start the re-linking process.

Slurm Jobs & Batch scripts

If you have active jobs waiting in the SLURM partition queues, and they were submitted before your project was migrated, you will need to re-submit these jobs.

Jobs that were placed into the queue only make note of the file paths that were available at the time the request was made. Jobs that were put into the queue before the migration will need to be resubmitted in order to leverage the new file paths.

The following command can be used to determine what job script what used to place the job into the queue.

sacct -Xaj <job-id> -o workdir,submitline

(If the column is not wide enough the syntax %## can be used to specify the text width e.g. submitline%30 will make the column 30 character spaces wide)

File paths that are hard coded into job scripts can also be updated to make use of the environment variables that MSI has made available to refer to the project space See MSI Aliases

After the job has been cancelled and potentially updated, it can be placed back into the queue

Jobs can be cancelled using the commands

scancel <job-id>

To cancel all the jobs associated to a user account, useful for cases where there are hundreds of jobs on a single account in the queue and only the pending jobs need to be targeted.

scancel -u internetid --state=PENDING

Replace Public folder function

Due to new security settings in place, Public folders will no longer function as globally accessible paths to all other MSI projects.  If you have a Public folder that hosts data for another project you will need to open a new project with those collaborators and migrate the contents of the public folder to that new collaborative project.  

If your project hosts 'common good' data used by a plurality of MSI projects, please contact the MSI Helpdesk for assistance with determining if the data you are hosting falls under the common good data initiative and could be instead hosted in a central location for easy and consistent access by all MSI projects

Also, if your project hosts software intended for use by other projects at MSI, again please contact the MSI Helpdesk to determine if this software can or should be incorporated into MSI's software library.

Scratch and Project Directory Storage Speeds

Your project folder now resides on the same storage, and should have the same performance characteristics, as scratch.global.  

If you previously created workflows to copy data to scratch.global to get the extra performance of that storage, you should be able to revert to just using your project folder (which will save your jobs on the time it takes to copy the data back and forth). 

You should continue to use scratch.global for temporary work files and processing artifacts that you do not need to retain after a job completes.