Best practices Managing Conda Envs.

Best practices for managing conda environments

Conda works best at MSI when used to manage distinct environments for different workflows, rather than installing packages to a single central location. Doing so reduces the possibility of issues from conflicting version requirements, makes it easier to create upgraded workflows without breaking existing ones, and to share your workflows with other users (including your future self).

Use community-managed channels

The company Anaconda that develops conda and provides several package repositories for it has changed their Terms of Service as of Spring 2024 to no longer accommodate free use of their licensed components by academic researchers. Discussion on how to proceed under these ToS within the community and within Anaconda, as well as between the community and Anaconda, is still ongoing as of Nov 2024.

In the meantime, avoid using the package channels defaults, main, anaconda, msys2, or r. You will want to make sure that these package channels do not appear in your ~/.condarc file, and that you do not reference them in your conda install commands.

Finally, use miniforge over other modules that provide conda, as it avoids using forbidden channels by default.

Create independent environments

The recommended command for creating a conda environment would look like the following. For the sake of example, let's say we want to install the package pandas with this environment:

module load miniforge

conda create --copy -p /path/to/my/conda/environment pandas

There are a few important parts of the conda command to consider:

  • Note that we are using `conda create` rather than `conda install`. The latter would attempt to install packages into the currently active environment, rather than creating a new environment.

  • We are including the `--copy` flag, which makes a local copy of all of the libraries and dependencies installed by the environment. The default behavior without this flag is to link to any libraries that are available in the miniforge module, and we want our conda environments to be as self-contained as possible.

  • We are using the `-p` flag (also known as `--prefix`) to install to a particular location, `/path/to/my/conda/environment` in this case. This allows us to install to an appropriate directory for the conda environment we are creating, rather than simply defaulting to installing in the `~/.conda` directory. Some likely places you might want to install a conda environment could be:

    • Your group's shared directory (for large environments, or those that other group members might want to use)

    • A specific directory where you install all of your software, e.g. `~/software.install`

    • A temporary directory like `/tmp` or `/scratch.global` for cases where your environment is disposable or ephemeral (some development workflows benefit from this)

Install all packages at once

For more complex environments where you may be installing many packages at once, we recommend specifying all the packages you want conda to install on this original command rather than installing them later. So if you wanted to install pandas, tensorflow, pillow, and scikit-learn, your command to create the environment might look like:

conda create --copy -p /path/to/my/conda/environment pandas tensorflow pillow scikit-learn

Including all the packages in the same conda command allows the environment solver to do a better job of creating a consistent environment.

If you need to install packages as a second step via e.g., `pip`, then you can usually do so safely if you run the `pip install` step immediately after the environment is created.

Don't modify existing environments — create new ones instead

The guidance about installing all packages at once extends to when you might need to update or upgrade an environment weeks or months later to include a new package. Or an updated version of one or more packages already in the environment. Rather than updating the existing environment, we recommend creating a new environment with the upgraded contents. This is for two main reasons:

  • If something goes wrong during the installation, by using this method the old environment will still be available in its original location

  • The status of packages and their relationships on conda's servers may change significantly over time, and can cause unintended errors and issues with in-place installs of older environments

Take snapshots of your important environments

It is a good practice to record the contents of your environments after installing them, so that you are able to reproduce them exactly in the future. One way to do this is via the command:

conda env export --no-builds

This command is run while your environment is activated, and will print all of the packages and their versions installed by conda into your environment. If you capture this output into a .yml file, you can use it to recreate the exact environment in the future so long as the relevant package versions are still available on the remote server. This is explored via an example in our tutorial on software management at MSI.

Alternate strategies for snapshotting an environment to share might include bundling the environment into a single file that can be backed up in another location or shared out to colleagues. For this purpose, we recommend either the conda-pack tool (https://conda.github.io/conda-pack/) or bundling your environment into an apptainer. You can see some guidance on creating custom apptainers in our tutorial on the subject.

Don't use conda activate

The default command for activating a conda environment doesn't work cleanly in an HPC environment. If you need a direct analogue, we recommend using `source activate`, which would look like the following for an environment installed to the prefix `/path/to/my/conda/environment`:

source activate /path/to/my/conda/environment

This generally behaves better than the default command, but will be deprecated in the not-too-distant future.

A more general approach can be to modify your PATH variable to include the environments' `bin` directory, preferably using a modulefile. This is explored via an example in our tutorial on software management at MSI.

Addressing version conflicts

Sometimes you will run into issues while building or using an environment where the package versions that the solver installed are not compatible. When you run into these errors at build time, they will look something like:

Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:

  - package scipy-1.15.2-py310h1d65ade_0 requires python_abi 3.10.* *_cp310, but none of the providers can be installed

Here, a conflict between the package scipy and available versions of python_abi is being reported. Version conflicts of this type most commonly occur for environments with a large number of dependencies that are under active development.

Errors that show up at runtime will be more subtle and varied, and will likely not directly report an issue with package versions. But you might consider the possibility of a package version mismatch if you are running an example from the package developer on a freshly installed environment that crashes with an error traceback. This points to one or more python libraries installed to the environment.

In either case (build time or runtime), the recommended solutions are as follows:

Unpin package versions

If you are pinning package versions in your conda create command, try unpinning them. This allows the conda solver more flexibility in attempting to consistently solve the environment. For example, you would change:

conda create –copy -p /path/to/my/env python=2 numpy=1 scipy=1.15

Into:

conda create –copy -p /path/to/my/env python numpy scipy

Pin an older version of key package(s)

If you aren't pinning the package versions and the issue is still occurring, try pinning one or more packages to the second-most recent versions. You can browse releases for most packages on anaconda.org, e.g for scipy you can view all available versions here

Pinning a package to a previous release can help avoid issues that were just introduced with the latest release. I might want to pin scipy in this case, and 1.15.1 is the second-most recent release at the time of writing. The command I would try would then be:

conda create –copy -p /path/to/my/env python numpy scipy=1.15.1

Pin all packages to known working versions

If you have an environment snapshot or otherwise know the package versions for a working environment, try pinning the versions of all packages you are installing to those known-working versions.

If you don't have a reference to work from, you can try to manually determine the correct combination of package versions to use via trial and error. However, this is not practical for larger environments with dozens or hundreds of packages.

Wait until the issue is fixed

Wait a few days and try again. Often these issues arise when a package has many dependencies, and one of them is updated and the developers for other dependencies haven't updated their packages yet. 

You might also consider reporting the issue to the developers of the package you are having issues with, or to the developers maintaining the package repository you are using. For example, if you are working with conda-forge, you can look for and report issues to their github repository.