How do I update my workflow after the software library migration?
Updated 10/18/23 with specific issues and solutions.
Description of the migration
What is happening?
Original Path | New Path |
/soft | /common/software/install/migrated |
/panfs/roc/msisoft | /common/software/install/migrated |
/panfs/roc/soft/el6 | /common/software/install/migrated.softel6 |
/panfs/roc/intel | /common/software/install/migrated.intel |
If you reference any of the old paths above, make sure to update those references to point to the new locations. For example, if you reference the file:
/panfs/roc/msisoft/gcc/8.2.0/bin/gcc
in a script, you would want to update this reference to:
/common/software/install/migrated/gcc/8.2.0/bin/gcc
The modulefiles directories are also changing, and a list of the old paths and the new equivalents is as follows:
Original Path | New Path |
/panfs/roc/soft/modulefiles.common | /common/software/modulefiles/migrated/common |
/panfs/roc/soft/modulefiles.hpc | /common/software/modulefiles/migrated/hpc |
/panfs/roc/soft/modulefiles.centos7 | /common/software/modulefiles/migrated/centos7 |
/panfs/roc/soft/modulefiles.mesabi | /common/software/modulefiles/migrated/mesabi |
/panfs/roc/soft/modulefiles.mangi | /common/software/modulefiles/migrated/mangi |
/panfs/roc/soft/modulefiles.k40 | /common/software/modulefiles/migrated/k40 |
/panfs/roc/soft/modulefiles.v100 | /common/software/modulefiles/migrated/v100 |
/panfs/roc/soft/modulefiles.legacy | /common/software/modulefiles/migrated/legacy |
/panfs/roc/intel/modulefiles | /common/software/modulefiles/migrated/intel |
In migrating the over 5,000 software installations of the current software library to the new location, MSI has made a significant effort to patch and update the installations so they will work the same at the new location as they did at the old one. The main changes that were made fall under the following categories:
- Symlinks that refer to a '/panfs' directory have been updated to point to the equivalent location in '/common/software'
- Configuration files containing references to '/panfs' directories have been updated to reference the equivalent locations
- Executable files containing '/panfs' directories in their RPATH or RUNPATH have been patched to refer to the equivalent directories
- Modulefiles have been made more specific to ensure that dependencies are found correctly (e.g. updating PERL5LIB for perl modules to the new location)
- Reinstalling modules that don't respond to any of the above methods
These changes should have caught the majority of the issues that would arise from moving the software library to a new location. However, since this was an operation involving parsing and patching millions of files, there are likely going to be issues that we didn't anticipate or couldn't have tested for. This page lists some of the common issues and strategies for resolving them.
Why make this change?
When did this happen?
Changes you might need to make
Check your bashrc for references to old paths
Many software packages and workflow customizations will modify your bashrc file, which is used to initialize settings for new shell sessions. You might define or modify environment variables, define functions and aliases, or load modules among other possible customizations. Some software packages like conda will automatically modify your bashrc file in order to enable special features, so you may have changed your bashrc file even if you've never opened it yourself.
The file is located at ~/.bashrc , and is a plaintext file that you can open with your favorite text editor. Check this file for references to any of the old paths to software installs or modulefiles, and update them to the new paths or remove them if the modifications to your environment are no longer necessary.
As a common example of this, if you ever ran 'conda init' or an equivalent command, you will have a block in your ~/.bashrc file that looks like the following:
# >>> conda initialize >>> #
!! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/panfs/roc/msisoft/mamba/0.11.3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then eval "$__conda_setup"
else
if [ -f "/panfs/roc/msisoft/mamba/0.11.3/etc/profile.d/conda.sh" ]; then
. "/panfs/roc/msisoft/mamba/0.11.3/etc/profile.d/conda.sh"
else
export PATH="/panfs/roc/msisoft/mamba/0.11.3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
The specific paths referenced will differ depending on which conda installation you used to run 'conda init', but any references here to a location starting with '/panfs/roc' will need to be updated. Alternatively, you could delete this section of your ~/.bashrc file, load a new conda module, and run 'conda init' again to regenerate this section with updated path references.
Load additional modules during your jobs
Some compiled software will hardcode hints to the location of dependencies when you build it. Later, when you run this software it will use these hints to find the location of library files and other dependencies that are not otherwise visible in your current environment. Unfortunately, hints of this type for software that was built before the software migration will no longer work. As a result you may start seeing 'missing library' errors for workflows that previously worked without issue.
Often you can resolve this by loading the module corresponding to the missing dependency. If you are unsure which module you should load, it will usually be one or more of the modules you loaded when you originally compiled the software. Common modules that might need to be loaded like this include gcc, cuda, and mkl.
Patch or rebuild software that hard-codes old paths
module load patchelf
# examine the executable to see the current RPATH or RUNPATH
$ patchelf --print-rpath ./cmake
/panfs/roc/msisoft/gcc/8.1.0/lib64:/panfs/roc/msisoft/isl/0.19_gcc8.1.0/lib:/panfs/roc/msisoft/mpc/1.1.0_gcc8.1.0/lib:/panfs/roc/msisoft/mpfr/4.0.1_gcc8.1.0/lib:/panfs/roc/msisoft/gmp/6.1.2_gcc8.1.0/lib
# update the executable with a new RPATH or RUNPATH
$ patchelf --set-rpath
-/common/software/install/migrated/gcc/8.1.0/lib64:/common/software/install/migrated/isl/0.19_gcc8.1.0/lib:/common/software/install/migrated/mpc/1.1.0_gcc8.1.0/lib:/common/software/install/migrated/mpfr/4.0.1_gcc8.1.0/lib:/common/software/install/migrated/gmp/6.1.2_gcc8.1.0/lib
Re-link files that point to old paths
Another potential issue will show up as 'Host is down:' errors when trying to run an executable file. This happens when the executable in the environment you are using is actually a link to that executable in a '/panfs/roc' location. You can find broken links of this type by running:
find ~/.conda/envs -xtype l
This example will print out a list of the broken links in your conda environments, but you can target any directory where you suspect broken links by modifying this command as needed. Depending on how many broken links you have, you may be able to manually relink them to the same executable in a new location. For instance, you might see the following output for a broken python executable in a conda environment named 'myenv':
/home/users/3/dunn0404/.conda/envs/myenv/bin/python
You can find out where this link is pointing via:
ls -lha /users/3/dunn0404/.conda/envs/myenv/bin/python
lrwxrwxrwx. 1 dunn0404 msistaff 42 Oct 12 12:08 /users/3/dunn0404/.conda/envs/myenv/bin/python -> /panfs/roc/msisoft/mamba/0.11.3/bin/python
Then, to relink this to the equivalent file in the new software library location, you could run:
ln -nsf /common/software/install/migrated/mamba/0.11.3/bin/python /users/3/dunn0404/.conda/envs/myenv/bin/python
If you have many broken links like this, it will likely be easier to re-create the environment from scratch. If this isn't feasible and you need help from MSI to preserve the original environment, please reach out to [email protected].
Common issues you might see
Host is down errors
Since the old software library was located on a network storage appliance that has now been partially turned off, you might see errors of the type:
Host is down:
when trying to run software, even when you wouldn't expect the particular command you are using to need to access another host. This error is showing up because some part of the command, usually the location of an executable file, references one of the old software install locations. So far we've seen this most commonly with python, R, Rscript, and ruby commands that use a conda environment.
The resolution for this issue is usually to update broken links to the old software paths and remove references to old paths in your bashrc.
Missing libraries
One of the more common issues you might see is a missing library file. These errors will look something like the following:
error while loading shared libraries : libsomething.so.16 cannot open shared object file : no such file or directory
This error indicates that the library 'libsomething.so.16' isn't available in your environment. The resolution for this issue is usually to load the module that provides this dependency or patch the impacted executables to reference the updated paths.
Conda environments not working
Due to the specifics of how they are installed, conda environments are especially prone to issues from the software migration. There are a variety of ways that a conda environment might fail after the migration, but you can address the majority of them by to updating broken links to the old software paths and removing references to old paths in your bashrc.
One additional issue you may see are errors referencing problems with an SSL CA certificate that prevent you from creating new environments. You can fix this by manually specifying the location of the certificate file for the conda module you are using. For instance, if you are using the the 'mamba' module you might do the following:
Find the root of the module install:
$ module show mamba
-------------------------------------------------------------------
/common/software/modulefiles/migrated/common/mamba/0.11.3:
prepend-path PATH /common/software/install/migrated/mamba/0.11.3/bin
-------------------------------------------------------------------
The root directory for this module will be the directory that contains 'bin'. So in this case, it would be
/common/software/install/migrated/mamba/0.11.3
The SSL certificate for conda modules is located under '$root/ssl/cert.pem', which in this case would be:
/common/software/install/migrated/mamba/0.11.3/ssl/cert.pem
You can then indicate the location of this certificate to your conda config by running:
conda config --set ssl_verify /common/software/install/migrated/mamba/0.11.3/ssl/cert.pem
At this point you should be able to create new conda environments again without SSL errors.
R libraries not working
Modules that simply stop working
Some of MSI's modules unexpectedly broke during the migration. While we did our best to patch all of the software installations to avoid this outcome, the wide variety in the design of software distribution means that this just isn't possible in some cases. If you find a module that is no longer working after the migration that doesn't match the descriptions of other common errors on this page, please report it to [email protected] so we can flag it for reinstallation.