This documentation and guide is specifically for MSI's migration of PI project storage from our older file system to the new VAST storage system, which is occurring during the latter half of 2025 and possibly early 2026. Each PI and project admin will be informed directly ahead of their project's migration, one week before, and on the day it takes place. At that time, you will need to follow the guide below for actions you need to take to finish setting up your new environment.
There are some tasks listed further down, such as re-linking data to Galaxy and preparing for Conda environment rebuilding, that can be started before the project migration takes place.
If your project's primary storage has recently been scheduled to be migrated from MSI’s Panasas file system to the newer VAST storage, you will need to read ahead to see what positive actions you may need to take in order to be able to make your research jobs work as you expect. Also see Scratch and Project Directory Storage Speeds.
If you have any jobs waiting in the SLURM queues that were submitted before the migration took place, you should pay special attention to the final item in the action list below (actions that all projects need to take) regarding re-submitting these jobs.
If you use Galaxy or Conda, you will almost certainly need to read the information below about rebuilding your environment(s). In rarer cases, if you are a researcher who builds their own software – or uses someone else’s software that was located in your project directory that was moved - then it is possible that you will need to rebuild your software package.
Actions that all projects need to take:
- Take the opportunity to review your use of MSI's disaster_recovery service
- Audit your symbolic links
- Why: Only some symbolic links were able to be automatically updated
- Edit job script to use environment variable such as $MSIPROJECT
- Why: The path to your project folder has changed
- Re-create jobs placed into the queue before the migration
- Why: Jobs 'memorize' your environment paths the moment they are placed into the queue, and the path to your project folder has changed, so existing jobs will fail if they reference your old project directory path.
- NOTE: Remember that you cannot resubmit old job IDs as they also still have your old project path embedded in them.
Conditional actions if your project use one of these research services or softwares:
- Update Globus Collections
- Why: Your project directory path has changed, but Globus shares still are pointed at the old path
- Rebuild conda environments located in your project folder
- Why: Conda environments were not migrated as they are known to break when moved
- Replace Public Folder function
- Why: New security settings prevent sharing data between projects with disparate members. New projects will need to be created with all collaborators to share data or tools with them
- Re-Link data to Galaxy if it was in your project folder
- Why: Data in your Project folder that was linked to Galaxy was moved, so the linkage is now broken
- NOTE: Data uploaded to Galaxy via the web interface is not affected
General Reminder : Store irreplaceable data in disaster_recovery
Since you will need to review your project folders as part of your migration outcomes, now is a perfect time to also carefully consider if you have data or results that are of utmost importance to your Project. Ensure that these files are stored in your projects shared/diaster_recovery folder to ensure that MSI's disaster recovery contingencies help you protect that data.
As a reminder: the disaster_recovery folder is not a replacement for having secure and redundant backups of your data. For more information see our page on Data Retention and Protection.
Project Path and MSI Project Variables
Where possible, please leverage the MSI Project Variables going forward. Visit MSI Aliases for reference on these environment variables.
If you have an application or tool chain that cannot use shell or environment variables, you can look up the path to your project folder with realpath -m $MSIPROJECT
You can switch between projects (which will automatically update all relevant variables) with
newgrp - PROJECT
If you are a member of multiple projects you may discover the directory path using commands:
1. list out projects your user account is a member of
id -Gn
2. select project and run command
path_to_project <project-name>
Globus
Visit Guest Collections Page to see collections owned by the users account
The process for updating a Guest collection begins with
- Identify the full path of your project's folder with
realpath -m $MSIPROJECT - Update the Project path as needed using the following example
OLD PATH /home/project_name/...
NEW PATH output from realpath -m $MSIPROJECT
This page can be used to create a new Guest collection
An example collection that points to /home/msistaff/shared would become /project/standard/msistaff/shared
Symlink Impacts
MSI project migration tools will auto-update absolute symbolic links that point to paths originally belonging to your primary project e.g /home/project_name . Relative links will NOT be updated
Links that mention other projects than the primary project may not be currently broken, but over the next 6-18 months, they will break as the other project will also be migrated and the link will no longer work.
To find all symlinks in your project folder that you have access to, you can use
find $MSIPROJECT -type l
To find all BROKEN symlinks you can use
find $MSIPROJECT -xtype l
There is no one-size-fits-all approach for symlinks that are broken. Also some may 'appear' to be broken that were created from inside of a singularity or apptainer container (as the link outside of the container will be broken but function normally inside of a container).
You can update symlinks manually as needed with the following command
ln -vfns RELATIVE_PATH_TO_TARGET PATH_OF_LINK_TO_MAKE- E.g.
ln -vfns ../shared/bin $MSIPROJECT/shared/myapp/bin
Symlink Performance Notes
We have become aware of a large number of users and projects that, even after being migrated, still have links that reference legacy paths or are just fully broken. These links end up introducing major performance impacts on workflows that end up interacting with them by accident or by design.
To search for these legacy links run the following commands:
In your home directory
while read -r -d $'\0' link ; do
target="$(realpath -m "$link")"
if [[ "$target" =~ ^/panfs/ ]]; then
echo "Link to legacy storage: $link -> $target"
elif [[ "$target" =~ ^/home/ ]] && [[ ! "$HOME" =~ ^/home/ ]]; then
echo "Link to legacy storage: $link -> $target"
fi
done < <(find ~/ -type l -print0)
In your project directory
while read -r -d $'\0' link ; do
target="$(realpath -m "$link")"
if [[ "$target" =~ ^/panfs/ ]]; then
echo "Link to legacy storage: $link -> $target"
elif [[ "$target" =~ ^/home/ ]] && [[ ! "$HOME" =~ ^/home/ ]]; then
echo "Link to legacy storage: $link -> $target"
fi
done < <(find "$MSIPROJECT" -type l -print0)
Conda and Miniconda(3) environments
Conda environments in the default .conda top level folder were not migrated as the migration is known to break them. If you have conda environments in non-standard paths, they are very likely broken and also will need repair. NOTE: if you have conda environments in your home directory that relied on resources in the /home/PROJECT folder, those will need to be rebuilt as well.
Load the miniforge module to make conda-like commands available
module load miniforge
list out current conda environments and make note of any that are stored in the project space (if any)
Export a yaml file that contains the package list for the environment and save a copy into the shared directory
conda env export --no-builds -n environment_name > $SHARED/environment_name.yaml
Edit the yaml and update:
prefix If the environment will be installed to another location than the default (user home directory), update the prefix to the path of interest
e.g
prefix=$SHARED/environment_nameset the name field the name of the new environment, we would recommend using a different name OR deleting the environment before creating its replacement
Existing Conda environments can be renamed with the command
conda rename -n current_name new_name
Build out the environment
conda env create -f $SHARED/environment_name.yaml
Delete the now unused environment
conda env remove -n environment_name
Galaxy
Data that is actively used on galaxy.msi.umn.edu can be relinked by sending an email to the help desk ([email protected]) and requesting the service. After the help desk receives the request, it can be added to the queue and processed. Once the data is linked, then the data may be accessed again.
An example request would look like
Hello MSI help,
I would like to request for the following paths to be linked to Galaxy
project_name/shared/...dirA
project_name/shared/...dirB
NOTE: This process can take place before the migration takes place and we encourage projects to reach out as early as they can to start the re-linking process.
Slurm Jobs & Batch scripts
If you have active jobs waiting in the SLURM partition queues, and they were submitted before your project was migrated, you will need to re-submit these jobs.
Jobs that were placed into the queue only make note of the file paths that were available at the time the request was made. Jobs that were put into the queue before the migration will need to be resubmitted in order to leverage the new file paths.
The following command can be used to determine what job script what used to place the job into the queue.
sacct -Xaj <job-id> -o workdir,submitline
(If the column is not wide enough the syntax %## can be used to specify the text width e.g. submitline%30 will make the column 30 character spaces wide)
File paths that are hard coded into job scripts can also be updated to make use of the environment variables that MSI has made available to refer to the project space See MSI Aliases
After the job has been cancelled and potentially updated, it can be placed back into the queue
Jobs can be cancelled using the commands
scancel <job-id>
To cancel all the jobs associated to a user account, useful for cases where there are hundreds of jobs on a single account in the queue and only the pending jobs need to be targeted.
scancel -u internetid --state=PENDING
Replace Public folder function
Due to new security settings in place, Public folders will no longer function as globally accessible paths to all other MSI projects. If you have a Public folder that hosts data for another project you will need to open a new project with those collaborators and migrate the contents of the public folder to that new collaborative project.
If your project hosts 'common good' data used by a plurality of MSI projects, please contact the MSI Helpdesk for assistance with determining if the data you are hosting falls under the common good data initiative and could be instead hosted in a central location for easy and consistent access by all MSI projects
Also, if your project hosts software intended for use by other projects at MSI, again please contact the MSI Helpdesk to determine if this software can or should be incorporated into MSI's software library.
Scratch and Project Directory Storage Speeds
Your project folder now resides on the same storage, and should have the same performance characteristics, as scratch.global.
If you previously created workflows to copy data to scratch.global to get the extra performance of that storage, you should be able to revert to just using your project folder (which will save your jobs on the time it takes to copy the data back and forth).
You should continue to use scratch.global for temporary work files and processing artifacts that you do not need to retain after a job completes.