From 430c713a9cfafed1059213b215f7c4a554509d81 Mon Sep 17 00:00:00 2001 From: "Alec Thomson (S&A, Kensington WA)" Date: Wed, 3 Apr 2024 20:57:55 +0800 Subject: [PATCH] Update docs --- docs/source/parallel.rst | 97 ++++++++++++++++++++++++++++------------ 1 file changed, 69 insertions(+), 28 deletions(-) diff --git a/docs/source/parallel.rst b/docs/source/parallel.rst index 6956c92d..32c238e1 100644 --- a/docs/source/parallel.rst +++ b/docs/source/parallel.rst @@ -1,35 +1,76 @@ Parallelisation --------------- -The pipeline uses Dask for parallelisation, and optionally for job submission. Dask be run either using either `dask-jobque `_ or `dask-mpi `_ for parallelisation. The latter requires a working version of the `mpi4py `_ package. The pipeline currently contains configurations for the CSIRO `petrichor` supercomputer and for the `galaxy` and `magnus` supercomputers at the `Pawsey Centre `_, which all use the Slurm job manager. +The pipeline uses `Dask `_ for parallelisation and job submission, and `Prefect `_ for pipeline orchestration. Specifically, a Prefect `DaskTaskRunner `_ is created based on a supplied configuration file. -.. tip :: - Note that mpi4py needs to point to the same MPI compiler used by the MPI executable. This can be tricky to find on some systems. If in doubt, get in touch with your local sysadmin. +Any Dask cluster is supported, including `dask-jobque `_ for HPC schedulers such as Slurm. This allows the pipeline to be run on virtually any system. For example, to use the `SlurmCluster` from `dask-jobqueue`, set the following in your configuration file: -Configuration is specicfied by a configuration file (written in YAML). These are stored in :file:`arrakis/configs/`. Add your own configuration by adding and editing a configuration, and point the pipeline to the file (see the dask-jobqueue `docs `_). +.. code-block:: yaml + + cluster_class: "dask_jobqueue.SLURMCluster" + +Configuration is specicfied by a file written in `YAML `_. These are stored in :file:`arrakis/configs/`. Add your own configuration by adding and editing a configuration, and point the pipeline to the file (see the dask-jobqueue `docs `_). Note that cluster configuration options should be specicfied under the `cluster_kwargs` section, and adaptative scaling options under the `adapt_kwargs` section (see examples below). For further reading on Dask's adaptive scaling, see `here `_. + +*Arrakis* supports two configurations to be supplied to the `spice_process` pipeline (see :ref:`Running the pipeline`) via the `--dask_config` and `--imager_dask_config` arguments. The former is used by the cutout pipeline, and the latter by the imager pipeline. Imaging typically requires more memory, and more CPUs per task, whereas the cutout pipeline requires high overall number of tasks. We provide two example configurations for CSIRO `petrichor` HPC cluster. + +For the imaging pipeline, an example configuration file is: + +.. code-block:: yaml + + # petrichor.yaml + # Set up for Petrichor + cluster_class: "dask_jobqueue.SLURMCluster" + cluster_kwargs: + cores: 16 + processes: 1 + name: 'spice-worker' + memory: "128GiB" + account: 'OD-217087' + walltime: '1-00:00:00' + job_extra_directives: ['--qos express'] + # interface for the workers + interface: "ib0" + log_directory: 'spice_logs' + job_script_prologue: [ + 'module load singularity', + 'unset SINGULARITY_BINDPATH' + ] + local_directory: $LOCALDIR + silence_logs: 'info' + adapt_kwargs: + minimum_jobs: 1 + maximum_jobs: 36 + wait_count: 20 + target_duration: "300s" + interval: "30s" + +For the cutout pipeline, an example configuration file is: .. code-block:: yaml - # Set up for Magnus - cores: 24 - processes: 12 - name: 'spice-worker' - memory: "60GB" - project: 'ja3' - queue: 'workq' - walltime: '6:00:00' - job_extra: ['-M magnus'] - # interface for the workers - interface: "ipogif0" - log_directory: 'spice_logs' - env_extra: [ - 'export OMP_NUM_THREADS=1', - 'source /home/$(whoami)/.bashrc', - 'conda activate spice' - ] - python: 'srun -n 1 -c 24 python' - extra: [ - "--lifetime", "11h", - "--lifetime-stagger", "5m", - ] - death_timeout: 300 - local_directory: '/dev/shm' + # rm_petrichor.yaml + # Set up for Petrichor + cluster_class: "dask_jobqueue.SLURMCluster" + cluster_kwargs: + cores: 4 + processes: 4 + name: 'spice-worker' + memory: "256GiB" + account: 'OD-217087' + walltime: '0-01:00:00' + job_extra_directives: ['--qos express'] + # interface for the workers + interface: "ib0" + log_directory: 'spice_logs' + job_script_prologue: [ + 'module load singularity', + 'unset SINGULARITY_BINDPATH', + 'export OMP_NUM_THREADS=1' + ] + local_directory: $LOCALDIR + silence_logs: 'info' + adapt_kwargs: + minimum: 108 + maximum: 512 + wait_count: 20 + target_duration: "5s" + interval: "10s"