diff --git a/docs/software_testing.md b/docs/software_testing.md index 3a0cde29d..1cf8c38bb 100644 --- a/docs/software_testing.md +++ b/docs/software_testing.md @@ -1,61 +1,3 @@ -#Software testing +# Software testing -**WARNING: development of the software test suite has only just started and is a work in progress. This page describes how the test suite _will_ be designed, but many things are not implemented yet and the design may still change.** - -##Description of the software test suite - -###Framework -The EESSI project uses the [ReFrame framework](https://reframe-hpc.readthedocs.io/en/stable/index.html) for software testing. ReFrame is designed particularly for testing HPC software and thus has well integrated support for interacting with schedulers, as well as various launchers for MPI programs. - -###Test variants -The EESSI software stack can be used in various ways, e.g. by using the [container](../pilot/#accessing-the-eessi-pilot-repository-through-singularity) or when the CVMFS software stack is mounted natively. This means the commands that need to be run to test an application are different in both cases. Similarly, systems may have different hardware (CPUs v.s. GPUs, system size, etc). Thus, tests - e.g. a GROMACS test - may have different variants: one designed to run on CPUs, one on GPUs, one designed to run through the container, etc. - -The main goal of the EESSI test suite is to test the software stack on systems that have the EESSI CVMFS mounted natively. Some tests may also have variants that can run the same test through the container, but note that this setup is technically much more difficult. Thus, the main focus is on tests that run with a native CVMFS mount of the EESSI stack. - -By default, ReFrame runs all test variants it find. Thus, in our test suite, we prespecify a number of tags that can be used to select an appropriate subset of tests for your system. We recognize the following tags: - -- container: tests that use the EESSI container to run the software. E.g. one variant of our GROMACS test uses `singularity exec` to launch the EESSI container, load the GROMACS module, and run the GROMACS test. -- `native`: tests that rely on the EESSI software stack being available through the modules system. E.g. one variant of the GROMACS test loads the GROMACS module and runs the GROMACS test. -- `singlecore`: tests designed to run on a single core -- `singlenode`: tests designed to run on a single (multicore) node (note: may still use MPI for multiprocessing) -- `small`: tests designed to run on 2-8 nodes. -- `large`: tests designed to run on >9 nodes. -- `cpu`: test designed to run on CPU. -- `gpu`, gpu_nvidia, gpu_amd: test designed to run on GPUs / nvidia GPUs / AMD GPUs. - -##How to run the test suite - -### General requirements - -- A copy of the `tests` directory from [software repository](https://github.com/EESSI/software-layer) - -### Requirements for container-based tests -Specifically for container-based tests, there are some requirements on the host system: - -- An installation of ReFrame -- An MPI installation (to launch MPI tests) or PMIx-based launcher (e.g. SLURM compiled with PMIx support) -- Singularity - -The container based tests will use a so-called shared alien CVMFS cache to store temporary data. In addition, they use a local CVMFS cache for speed. For this reason, the container tests need to be pointed to one directory that is shared between nodes on your system, and one directory that is node-specific (preferably a local disk). The `shared_alien_cache_minimal.sh` script that is part of the test suite defines these, and sets up the correct CVMFS configuration. You will have to adapt the `SHAREDSPACE` and `LOCALSPACE` variables in that script for your system, and point them to a shared and node-local directory. - -### Setting up a ReFrame configuration file -Once the prerequisites have been met, you'll need to create a ReFrame configuration file that matches your system (see the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html)). If you want to use the container-based tests, you *have* to define a partition programming environment called `container` and make sure it loads any modules needed to provide the MPI installation and singularity command. For an example configuration file, check the `tests/reframe/config/settings.py` in the [software-layer repository](https://github.com/EESSI/software-layer). Other than (potential) adaptations to the `container` environment, you should only really need to change the `systems` part. - -### Adapting the tests to your system -For now, you will have to adapt the number of tasks specified in full-node tests to match the number of cores your machine has in a single node (in the future, you should be able to do this through the reframe configuration file). To do so, change all `self.num_tasks_per_node` you find in the various tests to that core count (unless they are 1, in which case the test specifically intended for only 1 process per node). - - -### An example run -In this example, we assume your current directory is the `tests/reframe` folder. To list e.g. all single node, cpu-based application tests on a system that has the EESSI software environment available natively, you execute: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu -``` -(assuming you adapted the config file in `config/settings.py` for your system). This should list the tests that are selected based on the provided tags. To run the tests, change the `-l` argument into a `-r`: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu --performance-report -``` -To run the same tests with using the EESSI container, run: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t container -t single -t cpu --performance-report -``` -Note that not all tests necessarily have implementations to run using the EESSI container: the primary focus of the test suite is for HPC sites to check the performance of their software suite. Such sites should have CVMFS mounted natively for optimal performance anyway. +**This page has been replaced with [test-suite](test-suite/index.md), update your bookmarks!** diff --git a/docs/test-suite/index.md b/docs/test-suite/index.md new file mode 100644 index 000000000..be67c25d8 --- /dev/null +++ b/docs/test-suite/index.md @@ -0,0 +1,12 @@ +# EESSI test suite + +The [EESSI test suite](https://github.com/EESSI/test-suite) is a collection of tests that are run using +[ReFrame](https://reframe-hpc.readthedocs.io/). +It is used to check whether the software installations included in the [EESSI software layer](../software_layer) +are working and performing as expected. + +To get started, you should look into the [installation and configuration guidelines](installation-configuration.md) first. + +For more information on using the EESSI test suite, see [here](usage.md). + +See also [release notes for the EESSI test suite](release-notes.md). diff --git a/docs/test-suite/installation-configuration.md b/docs/test-suite/installation-configuration.md new file mode 100644 index 000000000..d07480ffe --- /dev/null +++ b/docs/test-suite/installation-configuration.md @@ -0,0 +1,497 @@ +# Installing and configuring the EESSI test suite + +This page covers the installation and configuration of the [EESSI test suite](https://github.com/EESSI/test-suite). + +For information on *using* the test suite, see [here](usage.md). + + +## Installation { #installation } + +### Requirements { #requirements } + +The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io) v4.3.3 (or newer). + +??? note "(for more details on the ReFrame version requirement, click here)" + + Two important bugs were resolved in ReFrame's CPU autodetect functionality [in version 4.3.3](https://github.com/reframe-hpc/reframe/pull/2978). + + _We strongly recommend you use `ReFrame >= 4.3.3`_. + + If you are using an older version of ReFrame, you may encounter some issues: + + * ReFrame will try to use the parallel launcher command configured for each partition (e.g. `mpirun`) when doing + the remote autodetect. If there is no system-version of `mpirun` available, that will fail + (see [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926)). + * CPU autodetection only worked when using a clone of the ReFrame repository, _not_ when it was installed + with `pip` or `EasyBuild` (as is also the case for the ReFrame shipped with EESSI) + (see [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914)). + + +### Installing Reframe (incl. test library) + +You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: + +```bash +reframe --version +``` + +General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). + +#### ReFrame test library (`hpctestlib`) + +The EESSI test suite requires that the [ReFrame test library (`hpctestlib`)](https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html) +is available, which is currently not included in a standard installation of ReFrame. + +We recommend installing ReFrame using [EasyBuild](https://easybuild.io/) (version 4.8.1, or newer), +or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). + +For example (using EESSI): + +```bash +source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash +module load ReFrame/4.2.0 +``` + +To check whether the ReFrame test library is available, try importing a submodule of the `hpctestlib` Python package: + +```bash +python3 -c 'import hpctestlib.sciapps.gromacs' +``` + +### Installing the EESSI test suite + +To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: + +#### Using `pip` { #pip-install } + +```bash +pip install git+https://github.com/EESSI/test-suite.git +``` + +#### Cloning the repository + +```bash +git clone https://github.com/EESSI/test-suite $HOME/EESSI-test-suite +cd EESSI-test-suite +export PYTHONPATH=$PWD:$PYTHONPATH +``` + +#### Verify installation + +To check whether the EESSI test suite installed correctly, +try importing the `eessi.testsuite` Python package: + +```bash +python3 -c 'import eessi.testsuite' +``` + + +## Configuration + +Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. + +Example configuration files are available in the `config` subdirectory of the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/config), +which you can use as a template to create your own. + +### Configuring ReFrame environment variables + +We recommend setting a couple of `$RFM_*` environment variables to configure ReFrame, to avoid needing to include particular options to the `reframe` command over and over again. + +#### ReFrame configuration file (`$RFM_CONFIG_FILES`) { #RFM_CONFIG_FILES } + +*(see also [`RFM_CONFIG_FILES` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* + +Define the `$RFM_CONFIG_FILES` environment variable to instruct ReFrame which configuration file to use, for example: + +```bash +export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py +``` + +Alternatively, you can use the `--config-file` (or `-C`) `reframe` option. + +See the [section on the ReFrame configuration file](#reframe-config-file) below for more information. + +#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) + +*(see also [`RFM_CHECK_SEARCH_PATH` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* + +Define the `$RFM_CHECK_SEARCH_PATH` environment variable to tell ReFrame which directory to search for tests. + +In addition, define `$RFM_CHECK_SEARCH_RECURSIVE` to ensure that ReFrame searches `$RFM_CHECK_SEARCH_PATH` recursively +(i.e. so that also tests in subdirectories are found). + +For example: + +```bash +export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests +export RFM_CHECK_SEARCH_RECURSIVE=1 +``` + +Alternatively, you can use the `--checkpath` (or `-c`) and `--recursive` (or `-R`) `reframe` options. + +#### ReFrame prefix (`$RFM_PREFIX`) { #RFM_PREFIX } + +*(see also [`RFM_PREFIX` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_PREFIX))* + +Define the `$RFM_PREFIX` environment variable to tell ReFrame where to store the files it produces. E.g. + +``` +export RFM_PREFIX=$HOME/reframe_runs +``` + +This involves: + +* test output directories (which contain e.g. the job script, stderr and stdout for each of the test jobs) +* staging directories (unless otherwise specified by `staging`, see below); +* performance logs; + +Note that the default is for ReFrame to use the current directory as prefix. +We recommend setting a prefix so that logs are not scattered around and nicely appended for each run. + +If our [common logging configuration](#logging) is used, the regular ReFrame log file will +also end up in the location specified by `$RFM_PREFIX`. + +!!! warning + + Using the `--prefix` option in your `reframe` command is *not* equivalent to setting `$RFM_PREFIX`, + since our [common logging configuration](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/common_config.py) + only picks up on the `$RFM_PREFIX` environment variable to determine the location for the ReFrame log file. + +### ReFrame configuration file { #reframe-config-file } + +In order for ReFrame to run tests on your system, it needs to know some properties about your system. +For example, it needs to know what kind of job scheduler you have, which partitions the system has, +how to submit to those partitions, etc. +All of this has to be described in a *ReFrame configuration file* (see also the [section on `$RFM_CONFIG_FILES` above](#RFM_CONFIG_FILES)). + +The [official ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html) provides the full +description on configuring ReFrame for your site. However, there are some configuration settings that are specifically +required for the EESSI test suite. Also, there are a large amount of configuration settings available in ReFrame, +which makes the official documentation potentially a bit overwhelming. + +Here, we will describe how to create a configuration file that works with the EESSI test suite, starting from an +[example configuration file `settings_example.py`](https://github.com/EESSI/test-suite/tree/main/config/settings_example.py), +which defines the most common configuration settings. + +You can look at other example configurations in the [config directory](https://github.com/EESSI/test-suite/tree/main/config/) for more inspiration. + +#### Python imports + +The EESSI test suite standardizes a few string-based values as constants, as well as the logging format used by ReFrame. +Every ReFrame configuration file used for running the EESSI test suite should therefore start with the following import statements: + +```python +from eessi.testsuite.common_config import common_logging_config +from eessi.testsuite.constants import * +``` + +#### High-level system info (`systems`) + +First, we describe the system at its highest level through the +[`systems`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#systems) keyword. + +You can define multiple systems in a single configuration file (`systems` is a Python list value). +We recommend defining just a single system in each configuration file, as it makes the configuration file a bit easier to digest (for humans). + +An example of the `systems` section of the configuration file would be: + +```python +site_configuration = { + 'systems': [ + # We could list multiple systems. Here, we just define one + { + 'name': 'example', + 'descr': 'Example cluster', + 'modules_system': 'lmod', + 'hostnames': ['*'], + 'stagedir': f'/some/shared/dir/{os.environ.get("USER")}/reframe_output/staging', + 'partitions': [...], + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.name): + The name of the system. Pick whatever makes sense for you. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.descr): + Description of the system. Again, pick whatever you like. +- [`modules_system`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.modules_system): + The modules system used on your system. EESSI provides modules in `lmod` format. There is no need to change this, + unless you want to run tests from the EESSI test suite with non-EESSI modules. +- [`hostnames`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.hostnames): + The names of the hosts on which you will run the ReFrame command, as regular expression. Using these names, + ReFrame can automatically determine which of the listed configurations in the `systems` list to use, which is useful + if you're defining multiple systems in a single configuration file. If you follow our recommendation to limit + yourself to one system per configuration file, simply define `'hostnames': ['*']`. +- [`prefix`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.prefix): + Prefix directory for a ReFrame run on this system. Any directories or files produced by ReFrame will use this prefix, + if not specified otherwise. + We recommend setting the `$RFM_PREFIX` environment variable rather than specifying `prefix` in + your configuration file, so our [common logging configuration](#logging) can pick up on it + (see also [`$RFM_PREFIX`](#RFM_PREFIX)). +- [`stagedir`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.stagedir): A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a '`stage`' directory inside the `prefix` directory. +- [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions): Details on system partitions, see below. + + + +#### System partitions (`systems.partitions`) { #partitions } + +The next step is to add the system partitions to the configuration files, +which is also specified as a Python list since a system can have multiple partitions. + +The [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions) +section of the configuration for a system with two [Slurm](https://slurm.schedmd.com/) partitions (one CPU partition, +and one GPU partition) could for example look something like this: + +```python +site_configuration = { + 'systems': [ + { + ... + 'partitions': [ + { + 'name': 'cpu_partition', + 'descr': 'CPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p cpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'features': [FEATURES[CPU]], + }, + { + 'name': 'gpu_partition', + 'descr': 'GPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p gpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + } + ], + 'devices': [ + { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, + } + ], + 'features': [ + FEATURES[CPU], + FEATURES[GPU], + ], + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA], + }, + }, + ] + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.name): + The name of the partition. Pick anything you like. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.descr): + Description of the partition. Again, pick whatever you like. +- [`scheduler`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler): + The scheduler used to submit to this partition, for example `slurm`. All valid options can be found + [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler). +- [`launcher`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher): + The parallel launcher used on this partition, for example `mpirun` or `srun`. All valid options can be found + [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher). +- [`access`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.access): + A list of arguments that you would normally pass to the scheduler when submitting to this partition + (for example '`-p cpu`' for submitting to a Slurm partition called `cpu`). + If supported by your scheduler, we recommend to _not_ export the submission environment + (for example by using '`--export=None`' with Slurm). This avoids test failures due to environment variables set + in the submission environment that are passed down to submitted jobs. +- [`prepare_cmds`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.prepare_cmds): + Commands to execute at the start of every job that runs a test. If your batch scheduler does not export + the environment of the submit host, this is typically where you can initialize the EESSI environment. +- [`environs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.environs): + The names of the *programming environments* (to be defined later in the configuration file via [`environments`](#environments)) + that may be used on this partition. A programming environment is required for tests that are compiled first, + before they can run. The EESSI test suite however only tests existing software installations, so no compilation + (or specific programming environment) is needed. Simply specify `'environs': ['default']`, + since ReFrame requires that *a* default environment is defined. +- [`max_jobs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.max_jobs): + The maximum amount of jobs ReFrame is allowed to submit in parallel. Some batch systems limit how many jobs users + are allowed to have in the queue. You can use this to make sure ReFrame doesn't exceed that limit. +- [`resources`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#custom-job-scheduler-resources): + This field defines how additional resources can be requested in a batch job. Specifically, on a GPU partition, + you have to define a resource with the name '`_rfm_gpu`'. The `options` field should then contain the argument to be + passed to the batch scheduler in order to request a certain number of GPUs _per node_, which could be different for + different batch schedulers. For example, when using Slurm you would specify: + ```python + 'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + }, + ], + ``` +- [`processor`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor): + We recommend to *NOT* define this field, unless [CPU autodetection](#cpu-auto-detection) is not working for you. + The EESSI test suite relies on information about your processor topology to run. Using CPU autodetection is the + easiest way to ensure that _all_ processor-related information needed by the EESSI test suite are defined. + Only if CPU autodetection is failing for you do we advice you to set the `processor` in the partition configuration + as an alternative. Although additional fields might be used by future EESSI tests, at this point you'll have to + specify _at least_ the following fields: + ```python + 'processor': { + 'num_cpus': 64, # Total number of CPU cores in a node + 'num_sockets': 2, # Number of sockets in a node + 'num_cpus_per_socket': 32, # Number of CPU cores per socket + 'num_cpus_per_core': 1, # Number of hardware threads per CPU core + } + ``` +- [`features`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.features): + The `features` field is used by the EESSI test suite to run tests _only_ on a partition if it supports a certain + _feature_ (for example if GPUs are available). Feature names are standardized in the EESSI test suite in + [`eessi.testsuite.constants.FEATURES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) + dictionary. + Typically, you want to define `features: [FEATURES[CPU]]` for CPU based partitions, and `features: [FEATURES[GPU]]` + for GPU based partitions. The first tells the EESSI test suite that this partition can only run CPU-based tests, + whereas second indicates that this partition can only run GPU-based tests. + You _can_ define a single partition to have _both_ the CPU and GPU features (since `features` is a Python list). + However, since the CPU-based tests will not ask your batch scheduler for GPU resources, this _may_ fail on batch + systems that force you to ask for at least one GPU on GPU-based nodes. Also, running CPU-only code on a GPU node is + typically considered bad practice, thus testing its functionality is typically not relevant. +- [`devices`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.devices): This field specifies information on devices (for example) present in the partition. Device types are standardized in the EESSI test suite in the [`eessi.testsuite.constants.DEVICE_TYPES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. This is used by the EESSI test suite to determine how many of these devices it can/should use per node. + Typically, there is no need to define `devices` for CPU partitions. + For GPU partitions, you want to define something like: + ```python + 'devices': { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, # or however many GPUs you have per node + } + ``` +- [`extras`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.extras): This field specifies extra information on the partition, such as the GPU vendor. Valid fields for `extras` are standardized as constants in [`eessi.testsuite.constants`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) (for example `GPU_VENDOR`). This is used by the EESSI test suite to decide if a partition can run a test that _specifically_ requires a certain brand of GPU. + Typically, there is no need to define `extras` for CPU partitions. + For GPU partitions, you typically want to specify the GPU vendor, for example: + ```python + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA] + } + ``` + +Note that as more tests are added to the EESSI test suite, the use of `features`, `devices` and `extras` by the EESSI test suite may be extended, which may require an update of your configuration file to define newly recognized fields. + +!!! note + + Keep in mind that ReFrame partitions are _virtual_ entities: they may or may not correspond to a partition as it is + configured in your batch system. One might for example have a single partition in the batch system, but configure + it as two separate partitions in the ReFrame configuration file based on additional constraints that are passed to + the scheduler, see for example the [AWS CitC example configuration](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py). + + The EESSI test suite (and more generally, ReFrame) assumes the hardware _within_ a partition defined in the ReFrame configuration file is _homogeneous_. + +#### Environments { #environments } + +ReFrame needs a programming environment to be defined in its configuration file for tests that need to be compiled before they are run. While we don't have such tests in the EESSI test suite, ReFrame requires _some_ programming environment to be defined: + +```python +site_configuration = { + ... + 'environments': [ + { + 'name': 'default', # Note: needs to match whatever we set for 'environs' in the partition + 'cc': 'cc', + 'cxx': '', + 'ftn': '', + } + ] +} +``` + +!!! note + + The `name` here needs to match whatever we specified for [the `environs` property of the partitions](#partitions). + +#### Logging + +ReFrame allows a large degree of control over what gets logged, and where. For convenience, we have created a common logging +configuration in [`eessi.testsuite.common_config`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/common_config.py) +that provides a reasonable default. It can be used by importing `common_logging_config` and calling it as a function +to define the '`logging` setting: +```python +from eessi.testsuite.common_config import common_logging_config + +site_configuration = { + ... + 'logging': common_logging_config(), +} +``` +When combined by setting the [`$RFM_PREFIX` environment variable](#RFM_PREFIX), the output, performance log, and +regular ReFrame logs will all end up in the directory specified by `$RFM_PREFIX`, which we recommend doing. + +Alternatively, a prefix can be passed as an argument like `common_logging_config(prefix)`, which will control where +the regular ReFrame log ends up. Note that the performance logs do *not* respect this prefix: they will still end up +in the standard ReFrame prefix (by default the current directory, unless otherwise set with `$RFM_PREFIX` or `--prefix`). + +#### Auto-detection of processor information { #cpu-auto-detection } + +You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. + +ReFrame will automatically use auto-detection when two conditions are met: + +1. The [`partitions` section of you configuration file](#partitions) does *not* specify `processor` information for a + particular partition (as per our recommendation [in the previous section](#partitions)); +2. The `remote_detect` option is enabled in the `general` part of the configuration, as follows: + ```python + site_configuration = { + 'systems': ... + 'logging': ... + 'general': [ + { + 'remote_detect': True, + } + ] + } + ``` + +To trigger the auto-detection of processor information, it is sufficient to +let ReFrame list the available tests: + +``` +reframe --list +``` + +ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. + +### Verifying your ReFrame configuration + +To verify the ReFrame configuration, you can [query the configuration using `--show-config`](https://reframe-hpc.readthedocs.io/en/stable/configure.html#querying-configuration-options). + +To see the full configuration, use: + +```bash +reframe --show-config +``` + +To only show the configuration of a particular system partition, you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). +To query a specific setting, you can pass an argument to `--show-config`. + +For example, to show the configuration of the `gpu` partition of the `example` system: + +```bash +reframe --system example:gpu --show-config systems/0/partitions +``` + +You can drill it down further to only show the value of a particular configuration setting. + +For example, to only show the `launcher` value for the `gpu` partition of the `example` system: + +```bash +reframe --system example:gpu --show-config systems/0/partitions/@gpu/launcher +``` diff --git a/docs/test-suite/release-notes.md b/docs/test-suite/release-notes.md new file mode 100644 index 000000000..dd2254164 --- /dev/null +++ b/docs/test-suite/release-notes.md @@ -0,0 +1,21 @@ +# Release notes for EESSI test suite + +## 0.1.0 + +Version 0.1.0 is the first release of the EESSI test suite. + +It includes: + +* A well-structured `eessi.testsuite` Python package that provides [constants](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py), + [utilities](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/utils.py), + [hooks](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/hooks.py), + and [tests](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/), + which can be [installed with "`pip install`"](installation-configuration.md#pip-install). +* Tests for [GROMACS](usage.md#gromacs) and [TensorFlow](usage.md#tensorflow) in [`eessi.testsuite.tests.apps`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps) + that leverage the functionality provided by `eessi.testsuite.*`. +* Examples of [ReFrame configuration files](installation-configuration.md#reframe-config-file) for various systems in + the [`config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config). +* A [`common_logging_config()`](installation-configuration.md#logging) function to facilitate the ReFrame logging configuration. +* A set of standard *device types* and *features* that can be used in the [`partitions` section of the ReFrame configuration file](installation-configuration.md#partitions). +* A set of [*tags* (`CI` + `scale`) that can be used to filter checks](usage.md#filter-tag). +* [Scripts](https://github.com/EESSI/test-suite/tree/main/scripts) that show how to run the test suite. diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md new file mode 100644 index 000000000..aeda63bb2 --- /dev/null +++ b/docs/test-suite/usage.md @@ -0,0 +1,307 @@ +# Using the EESSI test suite + +This page covers the usage of the [EESSI test suite](https://github.com/EESSI/test-suite). + +We assume you have already [installed and configured](installation-configuration.md) the EESSI test suite on your +system. + +## Listing available tests + +To list the tests that are available in the EESSI test suite, +use `reframe --list` (or `reframe -L` for short). + +If you have properly [configured ReFrame](#Configuring-ReFrame), you should +see a (potentially long) list of checks in the output: + +``` +$ reframe --list +... +[List of matched checks] +- ... +Found 123 check(s) +``` + +!!! note + When using `--list`, checks are only generated based on modules that are available in the system where the `reframe` command is invoked. + + The system partitions specified in your ReFrame configuration file are *not* taken into account when using `--list`. + + So, if `--list` produces an overview of 50 checks, and you have 4 system partitions in your configuration file, + actually running the test suite may result in (up to) 200 checks being executed. + +## Performing a dry run { #dry-run } + +To perform a dry run of the EESSI test suite, use `reframe --dry-run`: + +``` +$ reframe --dry-run +... +[==========] Running 1234 check(s) + +[----------] start processing checks +[ DRY ] GROMACS_EESSI ... +... +[----------] all spawned checks have finished + +[ PASSED ] Ran 1234/1234 test case(s) from 1234 check(s) (0 failure(s), 0 skipped, 0 aborted) +``` + +!!! note + + When using `--dry-run`, the systems partitions listed in your ReFrame configuration file are also taken into + account when generating checks, next to available modules and test parameters, which is *not* the case when using `--list`. + +## Running the (full) test suite + +To actually run the (full) EESSI test suite and let ReFrame +produce a performance report, use `reframe --run --performance-report`. + +We strongly recommend filtering the checks that will be run by using additional options +like `--system`, `--name`, `--tag` (see the 'Filtering tests' section below](#filtering-tests)), +and doing a [dry run](#dry-run) first to make sure that the generated checks correspond to what you have in mind. + +## ReFrame output and log files + +ReFrame will generate various output and log files: + +* a general ReFrame log file with debug logging on the ReFrame run (incl. selection of tests, generating checks, + test results, etc.); +* stage directories for each generated check, in which the checks are run; +* output directories for each generated check, which include the test output; +* performance log files for each test, which include performance results for the test runs; + +We strongly recommend controlling where these files go by using the [common logging configuration that +is provided by the EESSI test suite in your ReFrame configuration file](installation-configuration.md#logging) +and setting [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX) (avoid using the cmd line option `--prefix`). + +If you do, and if you use [ReFrame v4.3.3 or more newer](installation-configuration.md#requirements), +you should find the output and log files at: + +* general ReFrame log file at `$RFM_PREFIX/logs/reframe__.log`; +* stage directories in `$RFM_PREFIX/stage////`; +* output directories in `$RFM_PREFIX/output////`; +* performance log files in `$RFM_PREFIX/perflogs////`; + +In the stage and output directories, there will be a subdirectory for each check that was run, +which are tagged with a unique hash (like `d3adb33f`) that is determined based on the specific parameters for that check +(see the [ReFrame documentation for more details on the test naming scheme](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme)). + +## Filtering tests { #filtering-tests } + +By default, ReFrame will automatically generate checks for each system partition, +based on the tests available in the EESSI test suite, available software modules, +and tags defined in the EESSI test suite. + +To avoid being overwhelmed by checks, it is recommend to [apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) +so ReFrame only generates the checks you are interested in. + +### Filtering by test name { #filter-name } + +You can filter checks based on the full test name using the [`--name` option (or `-n`)](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-n), +which includes the value for all test parameters. + +Here's an example of a full test name: + +``` +GROMACS_EESSI %benchmark_info=HECBioSim/Crambin %nb_impl=cpu %scale=1_node %module_name=GROMACS/2023.1-foss-2022a /d3adb33f @example:gpu+default +``` + +To let ReFrame only generate checks for GROMACS, you can use: + +```bash +reframe --name GROMACS +``` + +To only run GROMACS checks with a particular version of GROMACS, you can use `--name` to only retain specific `GROMACS` +modules: + +```bash +reframe --name %module_name=GROMACS/2023.1 +``` + +Likewise, you can filter on any part of the test name. + +You can also select one specific check using the corresponding [test hash](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme), +which is also part of the full test name (see `/d3adb33f` in the example above): +for example: + +```bash +reframe --name /d3adb33f +``` + +The argument passed to `--name` is interpreted as a Python regular expression, so you can use wildcards like `.*`, +character ranges like `[0-9]`, use `^` to specify that the pattern should match from the start of the test name, etc. + +Use `--list` or `--dry-run` to check the impact of using the `--name` option. + +### Filtering by system (partition) { #filter-system-partition } + +By default, ReFrame will generate checks for each system partition that is listed in your configuration file. + +To only let ReFrame checks for a particular system or system partition, +you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). + +For example: + +* To let ReFrame only generate checks for the system named `example`, use: + ``` + reframe --system example ... + ``` +* To let ReFrame only generate checks for the `gpu` partition of the system named `example`, use: + ``` + reframe --system example:gpu ... + ``` + +Use `--dry-run` to check the impact of using the `--system` option. + + +### Filtering by tags { #filter-tag } + +To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). + +Using `--list-tags` you can get a list of known tags. + +To check the impact of this on generated checks by ReFrame, use `--list` or `--dry-run`. + +#### `CI` tag + +For each software that is included in the EESSI test suite, +a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. + +Hence, you can use this tag to let ReFrame only generate checks for small test cases: + +``` +reframe --tag CI +``` + +For example: + +``` +$ reframe --name GROMACS --tag CI +... +``` + +#### `scale` tags + +The EESSI test suite defines a set of custom tags that control the *scale* of checks, +which specify many cores/nodes should be used for running a check. + +| tag name | description | +|:--------:|-------------| +| `1_core` | using 1 CPU core and 1 GPU (if running a GPU test) | +| `2_cores` | using 2 CPU cores and 1 GPU (if running a GPU test) | +| `4_cores` | using 4 CPU cores and 1 GPU (if running a GPU test) | +| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | +| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | +| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | +| `1_node` | using a full node (all available cores/GPUs) | +| `2_nodes` | using 2 full nodes | +| `4_nodes` | using 4 full nodes | +| `8_nodes` | using 8 full nodes | +| `16_nodes` | using 16 full nodes | + +#### Using multiple tags + +To filter tests using multiple tags, you can: + +* use `|` as separator to indicate that *one* of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); +* use the `--tag` option multiple times to indicate that *all* specified tags must match (logical AND, for example `--tag CI --tag 1_core`); + +## Overriding test parameters *(advanced)* + +You can override test parameters using the [`--setvar` option (or `-S`)](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-S). + +This can be done either globally (for all tests), or only for specific tests (which is recommended when using `--setvar`). + +For example, to run all GROMACS checks with a specific GROMACS module, you can use: + +``` +reframe --setvar GROMACS_EESSI.modules=GROMACS/2023.1-foss-2022a ... +``` + +!!! warning + + We do not recommend using `--setvar`, since it is quite easy to make unintended changes to test parameters + this way that can result in broken checks. + + You should try filtering tests using the [`--name`](#filter-name) or [`--tag`](#filter-tag) options instead. + + +## Example commands + +### Running all GROMACS tests on 4 cores on the `cpu` partition + +``` +reframe --run --system example:cpu --name GROMACS --tag 4_cores --performance-report +``` + +### List all checks for TensorFlow 2.11 using a single node + +``` +reframe --list --name %module_name=TensorFlow/2.11 --tag 1_node +``` + +### Dry run of TensorFlow CI checks on a quarter (1/4) of a node (on all system partitions) + +``` +reframe --dry-run --name 'TensorFlow.*CUDA' --tag 1_4_node --tag CI +``` + +## Available tests { #available-tests } + +The EESSI test suite currently includes tests for: + +* [GROMACS](#gromacs) +* [TensorFlow](#tensorflow) + +For a complete overview of all available tests in the EESSI test suite, see the +[`eessi/testsuite/tests` subdirectory in the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/eessi/testsuite/tests). + +### GROMACS { #gromacs } + +Several tests for [GROMACS](https://www.gromacs.org), a software package to perform molecular dynamics simulations, +are included, which use the systems included in the [HECBioSim benchmark suite](https://www.hecbiosim.ac.uk/access-hpc/benchmarks): + +* `Crambin` (20K atom system) +* `Glutamine-Binding-Protein` (61K atom system) +- `hEGFRDimer` (465K atom system) +- `hEGFRDimerSmallerPL` (465K atom system, only 10k steps) +- `hEGFRDimerPair` (1.4M atom system) +- `hEGFRtetramerPair` (3M atom system) + +It is implemented in [`tests/apps/gromacs.py`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps/gromacs.py), +on top of the GROMACS test that is included in the [ReFrame test library `hpctestlib`](https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html). + +To run this GROMACS test with all HECBioSim systems, use: + +```bash +reframe --run --name GROMACS +``` + +To run this GROMACS test only for a specific HECBioSim system, use for example: + +```bash +reframe --run --name 'GROMACS.*HECBioSim/hEGFRDimerPair' +``` + +To run this GROMACS test with the smallest HECBioSim system (`Crambin`), you can use the `CI` tag: + +```bash +reframe --run --name GROMACS --tag CI +``` + +### TensorFlow { #tensorflow } + +A test for [TensorFlow](https://www.tensorflow.org), a machine learning framework, is included, +which is based on the ["Multi-worker training with Keras" TensorFlow tutorial](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras). + +It is implemented in [`tests/apps/tensorflow/`](https://github.com/EESSI/test-suite/tree/main/eessi/testsuite/tests/apps/tensorflow). + +!!! warning + This test requires TensorFlow v2.11 or newer, using an older TensorFlow version will not work! + +To run this TensorFlow test, use: + +```bash +reframe --run --name TensorFlow +``` diff --git a/mkdocs.yml b/mkdocs.yml index d1d94fe17..e72d7e339 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,11 @@ nav: - software_layer/cpu_targets.md - software_layer/build_nodes.md - software_layer/adding_software.md + - Test suite: + - Overview: test-suite/index.md + - Installation & configuration: test-suite/installation-configuration.md + - Usage: test-suite/usage.md + - Release notes: test-suite/release-notes.md - Build-test-deploy bot: bot.md - Pilot repository: pilot.md - Getting access to EESSI: diff --git a/talks/20210119_EESSI_behind_the_scenes/README.md b/talks/20210119_EESSI_behind_the_scenes/README.md index 9300d35ed..afac14678 100644 --- a/talks/20210119_EESSI_behind_the_scenes/README.md +++ b/talks/20210119_EESSI_behind_the_scenes/README.md @@ -77,7 +77,7 @@ prepared with the help from Terje Kvernes (@terjekv). We should ask the CVMFS developers about this too (see also https://cvmfs.readthedocs.io/en/stable/apx-security.html). -* Q: LTS for Gentoo? Lifetime? Major upgrade -> EasyBuild complete rebuild? How long can we re-use the previous "trees"? +* Q: LTS for Gentoo? Lifetime? Major upgrade -> EasyBuild complete rebuild? How long can we reuse the previous "trees"? * A: (question answered on stream, see recording). Short answer: we haven't decided this yet. diff --git a/talks/20210202_CernVM_Workshop/README.md b/talks/20210202_CernVM_Workshop/README.md index a703b2c07..e04071e79 100644 --- a/talks/20210202_CernVM_Workshop/README.md +++ b/talks/20210202_CernVM_Workshop/README.md @@ -12,7 +12,7 @@ https://indico.cern.ch/event/885212/overview * A: Jülich and CSCS are examples for large centers which are part of EESSI (not sure if they are part of PRACE or EuroHPC). * A: LUMI has shown signs of interest. -* Q (Valentin Volkl): Key4HEP is already using Gitlab CI for publising to CVMFS and would be interested in a GitHub PR based workflow as envisaged by EESSI, interested in collaboration; perhaps the Github action runner can help. +* Q (Valentin Volkl): Key4HEP is already using Gitlab CI for publishing to CVMFS and would be interested in a GitHub PR based workflow as envisaged by EESSI, interested in collaboration; perhaps the Github action runner can help. * A: We would be very interested to discuss this more. We also expect that the CVMFS ephemeral publish container would help. * Q (Dave Dykstra): Rocky Linux already uses building based on github PRs.