Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
AlecThomson committed Apr 3, 2024
1 parent 0eeae66 commit 430c713
Showing 1 changed file with 69 additions and 28 deletions.
97 changes: 69 additions & 28 deletions docs/source/parallel.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,76 @@
Parallelisation
---------------
The pipeline uses Dask for parallelisation, and optionally for job submission. Dask be run either using either `dask-jobque <https://jobqueue.dask.org/en/latest/>`_ or `dask-mpi <http://mpi.dask.org/en/latest/>`_ for parallelisation. The latter requires a working version of the `mpi4py <https://mpi4py.readthedocs.io/en/latest/>`_ package. The pipeline currently contains configurations for the CSIRO `petrichor` supercomputer and for the `galaxy` and `magnus` supercomputers at the `Pawsey Centre <https://pawsey.org.au/>`_, which all use the Slurm job manager.
The pipeline uses `Dask <https://www.dask.org/>`_ for parallelisation and job submission, and `Prefect <https://docs.prefect.io/latest/>`_ for pipeline orchestration. Specifically, a Prefect `DaskTaskRunner <https://prefecthq.github.io/prefect-dask/>`_ is created based on a supplied configuration file.

.. tip ::
Note that mpi4py needs to point to the same MPI compiler used by the MPI executable. This can be tricky to find on some systems. If in doubt, get in touch with your local sysadmin.
Any Dask cluster is supported, including `dask-jobque <https://jobqueue.dask.org/en/latest/>`_ for HPC schedulers such as Slurm. This allows the pipeline to be run on virtually any system. For example, to use the `SlurmCluster` from `dask-jobqueue`, set the following in your configuration file:

Configuration is specicfied by a configuration file (written in YAML). These are stored in :file:`arrakis/configs/`. Add your own configuration by adding and editing a configuration, and point the pipeline to the file (see the dask-jobqueue `docs <https://jobqueue.dask.org/en/latest/configuration.html/>`_).
.. code-block:: yaml
cluster_class: "dask_jobqueue.SLURMCluster"
Configuration is specicfied by a file written in `YAML <https://yaml.org/>`_. These are stored in :file:`arrakis/configs/`. Add your own configuration by adding and editing a configuration, and point the pipeline to the file (see the dask-jobqueue `docs <https://jobqueue.dask.org/en/latest/configuration.html#configuration>`_). Note that cluster configuration options should be specicfied under the `cluster_kwargs` section, and adaptative scaling options under the `adapt_kwargs` section (see examples below). For further reading on Dask's adaptive scaling, see `here <https://docs.dask.org/en/latest/adaptive.html>`_.

*Arrakis* supports two configurations to be supplied to the `spice_process` pipeline (see :ref:`Running the pipeline`) via the `--dask_config` and `--imager_dask_config` arguments. The former is used by the cutout pipeline, and the latter by the imager pipeline. Imaging typically requires more memory, and more CPUs per task, whereas the cutout pipeline requires high overall number of tasks. We provide two example configurations for CSIRO `petrichor` HPC cluster.

For the imaging pipeline, an example configuration file is:

.. code-block:: yaml
# petrichor.yaml
# Set up for Petrichor
cluster_class: "dask_jobqueue.SLURMCluster"
cluster_kwargs:
cores: 16
processes: 1
name: 'spice-worker'
memory: "128GiB"
account: 'OD-217087'
walltime: '1-00:00:00'
job_extra_directives: ['--qos express']
# interface for the workers
interface: "ib0"
log_directory: 'spice_logs'
job_script_prologue: [
'module load singularity',
'unset SINGULARITY_BINDPATH'
]
local_directory: $LOCALDIR
silence_logs: 'info'
adapt_kwargs:
minimum_jobs: 1
maximum_jobs: 36
wait_count: 20
target_duration: "300s"
interval: "30s"
For the cutout pipeline, an example configuration file is:

.. code-block:: yaml
# Set up for Magnus
cores: 24
processes: 12
name: 'spice-worker'
memory: "60GB"
project: 'ja3'
queue: 'workq'
walltime: '6:00:00'
job_extra: ['-M magnus']
# interface for the workers
interface: "ipogif0"
log_directory: 'spice_logs'
env_extra: [
'export OMP_NUM_THREADS=1',
'source /home/$(whoami)/.bashrc',
'conda activate spice'
]
python: 'srun -n 1 -c 24 python'
extra: [
"--lifetime", "11h",
"--lifetime-stagger", "5m",
]
death_timeout: 300
local_directory: '/dev/shm'
# rm_petrichor.yaml
# Set up for Petrichor
cluster_class: "dask_jobqueue.SLURMCluster"
cluster_kwargs:
cores: 4
processes: 4
name: 'spice-worker'
memory: "256GiB"
account: 'OD-217087'
walltime: '0-01:00:00'
job_extra_directives: ['--qos express']
# interface for the workers
interface: "ib0"
log_directory: 'spice_logs'
job_script_prologue: [
'module load singularity',
'unset SINGULARITY_BINDPATH',
'export OMP_NUM_THREADS=1'
]
local_directory: $LOCALDIR
silence_logs: 'info'
adapt_kwargs:
minimum: 108
maximum: 512
wait_count: 20
target_duration: "5s"
interval: "10s"

0 comments on commit 430c713

Please sign in to comment.