Using Spack Pipeline Feature without Containers

Spack provides a feature to generate a GitLab pipeline that will build all the specs of a given environment in a GitLab pipeline. This feature is documented by Spack and was originally introduced to run on cloud resource.

We intend to illustrate the use of Spack pipelines on an HPC cluster, and not for deployment, but rather for testing. This typically means we are not using containers, our runners have access to the shared file system.

Cloning and sharing a Spack instance

First of all, we don’t have Spack already on the system. Instead, we want to pinpoint exactly the version of Spack to be used in the CI pipeline.

Also, we don’t want to clone Spack in each job. So we will setup a Spack instance shared by all the CI jobs (remember we are not using containers here).

Requirements

  1. We want to use a single instance of Spack across all the CI jobs of a given pipeline, for performance reasons.

  2. We don’t want two independent pipelines to use the same instance of Spack, to avoid locks and conflicts on the Spack version used.

  3. We don’t want to download Spack history again if we already have it.

Implementation

In terms of implementation, we start with a script to get Spack.


A script to get Spack
if [[ ! -d ${SPACK_PATH} ]]
then
  mkdir -p ${SPACK_PATH}/..
  # A shallow clone is enough, and much faster.
  git clone ${SPACK_REPO} --depth 1 --branch ${SPACK_REF} ${SPACK_PATH}
  # We tag the commit so we can retrieve which one was used by a given pipeline.
  git tag ${CI_PIPELINE_ID}
else
  cd ${SPACK_PATH}
  git checkout -b temp
  git branch -D ${SPACK_REF}
  git fetch --depth 1 ${SPACK_REPO} ${SPACK_REF}:${SPACK_REF}
  git checkout ${SPACK_REF}
  git tag ${CI_PIPELINE_ID}
  git branch -D temp
  cd -
fi

This code is called in CI by a dedicated job:

Note

We define the script outside of the CI. This is a good practice for testing and sharing the CI script outside CI.


Then we share the location of Spack with the child pipeline.

We first create a global variable for the path, made “unique” by using the commit ID:

variables:
  SPACK_PARENT_DIR: ${CI_BUILDS_DIR}/../../llnl-stack-${CI_COMMIT_SHORT_SHA}
  SPACK_PATH: ${CI_BUILDS_DIR}/../../llnl-stack-${CI_COMMIT_SHORT_SHA}/spack

Then we propagate it to the child pipelines in the trigger job:

build-on-quartz:
  extends: [.env_defined]
  stage: build
  variables:
    CHILD_SPACK_PATH: ${SPACK_PATH}
  trigger:
    include:
      - artifact: "jobs_scratch_dir/pipeline.yml"
        job: generate-on-quartz
    strategy: depend
  needs: [generate-on-quartz]

Note

We need to change the variable name to pass it to the child pipeline. This has been`reported to GitLab <https://gitlab.com/gitlab-org/gitlab/-/issues/213729>`_.

Managing the GPG key

A blocking point while attempting to share a single Spack instance has to do with GPG key management.

In the general case each build jobs will register the mirror key, which will result in deadlocks in our case. We can instead register the key once, ahead of time.

We will produce a GPG key using an instance of Spack, and then create a back-up location, so that we can use it for any spack instance we create in the CI context. This is already detailed in spack mirror/build cache tutorial.

$ spack gpg create "My Name" "<my.email@my.domain.com>"
gpg: key 070F30B4CDF253A9 marked as ultimately trusted
gpg: directory '/home/spack/spack/opt/spack/gpg/openpgp-revocs.d' created
gpg: revocation certificate stored as '/home/spack/spack/opt/spack/gpg/openpgp-revocs.d/ABC74A4103675F76561A607E070F30B4CDF253A9.rev'
$ mkdir ~/private_gpg_backup
$ cp ~/spack/opt/spack/gpg/*.gpg ~/private_gpg_backup
$ cp ~/spack/opt/spack/gpg/pubring.* ~/mirror

Then we simply need to point the variable SPACK_GNUPGHOME to the location where the key is stored. Spack won’t try to register the key if this variable is already set.


More about this issue.
Spack pipeline feature was designed to work with jobs in containers, isolated from one another. We are breaking this assumption by using a unique Spack instance on the shared file system.

Each build jobs will attempt to register a GPG key concurrently, but using the same Spack instance. One of the job will create a lock that will cause all the others to fail.

Only generate one pipeline per stage

Because we are using a single instance of Spack, we want to avoid race conditions that could cause locks.

In practice it forced us to isolate each job that generates a pipeline in a separate stage. That is because of potential locks when concretizing two specs at the same time.


More about this issue.
We generate one pipeline per machine. The file system, where the unique instance of Spack lives, is shared among the machines. This is why we need to sequentially generate the pipelines. We may instead choose to have two Spack instances in the future.
stages:
  - setup
  - generate-quartz
  - generate-lassen
  - build
  - clean

Configuring the environment

There are several things to think of when configuring a Spack environment for CI on a cluster.

Isolate spack configuration

We need to make sure the environment (in particular the configuration files) is isolated. In practice it implies to workaround the user scope of configuration, as we want our pipeline to behave in a deterministic manner whoever triggers it.

The most robust way to do so is to make sure that the environment is the only place where the configuration is defined, and override everything else.

Note

We use include to keep things clear and split the config in separate files, but everything could be in a single spack.yaml file.

For example, we use the double colon override syntax in the packages section:

packages::

The same needs to be applied to the compilers section.

We also make sure to move any cache or stage directory to the Spack root dir, making them specific to that instance by design:

  config:
    install_tree:
      root: /usr/workspace/radiuss/install/radiuss
      padded_length: 128
      projections:
        all: '{architecture}/{compiler.name}-{compiler.version}/{name}-{version}-{hash}'
    misc_cache: $spack/.misc_cache
    test_stage: $spack/.test_stage
    'build_stage:':
    - $spack/var/spack/stage
  bootstrap:
    root: /usr/workspace/radiuss/store/radiuss

Note

We do not use the :: syntax on the config section. That is because we assume that it will not be affected as much by the user scope. However, note that we use it on the build_stage subsection, since it is a list that would otherwise consist in the merge of all the scopes.

Future works

Several ways of improvement we are exploring:

  • At the moment, each pipeline starts with a clone of Spack. Even if we do a shallow clone, this takes between 2 and 8 minutes in our observations.

  • The working directory is in the workspace filesystem, which is slow. We do not need persistence of our temporary files, so we could renounce to it and work in the node shared memory /dev/shm. Our first experiments suggests that this would greatly improve the performances.

Shared configuration files

We are planning to share the configuration files (compilers.yaml and packages.yaml for the most part) in another open-source repo.

This will help ensure consistency in out testing accross LLNL open-source projects. This is already in use in RAJA, Umpire and CHAI. Projects could still add their own configurations.