diff --git a/docs/3dgan_doc.rst b/docs/3dgan_doc.rst index 1b8added..ff3bf2d9 100644 --- a/docs/3dgan_doc.rst +++ b/docs/3dgan_doc.rst @@ -38,22 +38,22 @@ dataloader.py :language: python -cern-pipeline.yaml -++++++++++++++++++ +config.yaml ++++++++++++ This YAML file defines the pipeline configuration for the CERN use case. -.. literalinclude:: ../use-cases/3dgan/cern-pipeline.yaml +.. literalinclude:: ../use-cases/3dgan/config.yaml :language: yaml -inference-pipeline.yaml -+++++++++++++++++++++++ +create_inference_sample.py +++++++++++++++++++++++++++ -This YAML file defines the pipeline configuration for the CERN use case inference. +This file defines a pipeline configuration for the CERN use case inference. -.. literalinclude:: ../use-cases/3dgan/inference-pipeline.yaml - :language: yaml +.. literalinclude:: ../use-cases/3dgan/create_inference_sample.py + :language: python Dockerfile @@ -63,13 +63,11 @@ Dockerfile :language: bash -pipeline.yaml -+++++++++++++ +startscript ++++++++++++ -This YAML file defines the pipeline configuration for the CERN use case. It includes settings for the model, training, and evaluation. - -.. literalinclude:: ../use-cases/3dgan/pipeline.yaml - :language: yaml +.. literalinclude:: ../use-cases/3dgan/startscript + :language: bash @@ -90,12 +88,18 @@ interLink x 3DGAN 3dgan-inference.yaml -++++++++++++++++++++++++ +++++++++++++++++++++ .. literalinclude:: ../use-cases/3dgan/interLink/3dgan-inference.yaml :language: yaml +3dgan-train.yaml +++++++++++++++++ + +.. literalinclude:: ../use-cases/3dgan/interLink/3dgan-train.yaml + :language: yaml + .. .. automodule:: 3dgan.model diff --git a/docs/hpc_setup.rst b/docs/hpc_setup.rst deleted file mode 100644 index 607c876c..00000000 --- a/docs/hpc_setup.rst +++ /dev/null @@ -1,69 +0,0 @@ -.. 🌐 HPC systems -.. --------------- -How to use torch `DistributedDataParallel` (DDP), Horovod and DeepSpeed from the same client code. -Note that the environment is tested on the HDFML system at JSC. For other systems, the module versions might need change accordingly. - - -.. toctree:: - :maxdepth: 5 - - -Environments -++++++++++++ - -Install PyTorch env (GPU support) on Juelich Super Computer (tested on HDFML system) - -.. code-block:: bash - - torch-gpu-jsc: env-files/torch/createEnvJSC.sh - sh env-files/torch/createEnvJSC.sh - - -Install Tensorflow env (GPU support) on Juelich Super Computer (tested on HDFML system) - -.. code-block:: bash - - tf-gpu-jsc: env-files/tensorflow/createEnvJSCTF.sh - sh env-files/tensorflow/createEnvJSCTF.sh - - - -Setup -+++++ - -First, from the root of this `repository `_, build the environment containing pytorch, horovod and deepspeed. You can try with: - -.. code-block:: bash - - # Creates a Python venv called envAI_hdfml - make torch-gpu-jsc - - -Distributed training -++++++++++++++++++++ - -Each distributed strategy has its own SLURM job script, which should be used to run it: - -If you want to distribute the code in `train.py` with **torch DDP**, run from terminal: - -.. code-block:: bash - - sbatch ddp_slurm.sh - -If you want to distribute the code in `train.py` with **DeepSpeed**, run from terminal: - -.. code-block:: bash - - sbatch deepspeed_slurm.sh - -If you want to distribute the code in `train.py` with **Horovod**, run from terminal: - -.. code-block:: bash - - sbatch hvd_slurm.sh - -You can run all of them with: - -.. code-block:: bash - - bash runall.sh \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 82b192f8..215aedb6 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -37,23 +37,23 @@ In ``itwinai`` platform, we focus mainly on the MLOps step, simulating or oversi .. toctree:: :maxdepth: 2 :hidden: - :caption: 🪄 itwinai Modules + :caption: 📚 Integrated Use-cases - modules + use_cases .. toctree:: :maxdepth: 2 :hidden: - :caption: 📚 Integrated Use-cases + :caption: 🚀 Tutorials - use_cases + tutorials .. toctree:: :maxdepth: 2 :hidden: - :caption: 🚀 Tutorials + :caption: 🪄 Python API reference - tutorials + modules `interTwin Demo: itwinai integration with other DTE modules `_ diff --git a/docs/itwinai.cluster.rst b/docs/itwinai.cluster.rst deleted file mode 100644 index 7360c981..00000000 --- a/docs/itwinai.cluster.rst +++ /dev/null @@ -1,7 +0,0 @@ -itwinai.cluster -=============== - -.. automodule:: itwinai.cluster - :members: - :undoc-members: - :show-inheritance: diff --git a/docs/itwinai.distributed.rst b/docs/itwinai.distributed.rst new file mode 100644 index 00000000..26810535 --- /dev/null +++ b/docs/itwinai.distributed.rst @@ -0,0 +1,7 @@ +itwinai.distributed +=================== + +.. automodule:: itwinai.distributed + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/itwinai.tf.modules.rst b/docs/itwinai.tf.modules.rst index 8b923bea..b0347395 100644 --- a/docs/itwinai.tf.modules.rst +++ b/docs/itwinai.tf.modules.rst @@ -14,6 +14,13 @@ utils.py :language: python +distributed.py +++++++++++++++ + +.. literalinclude:: ../src/itwinai/tensorflow/distributed.py + :language: python + + .. .. automodule:: itwinai.tensorflow.trainer .. :members: .. :undoc-members: diff --git a/docs/itwinai.torch.modules.rst b/docs/itwinai.torch.modules.rst index d551af40..38b0878a 100644 --- a/docs/itwinai.torch.modules.rst +++ b/docs/itwinai.torch.modules.rst @@ -1,10 +1,10 @@ itwinai PyTorch Modules ======================= -cluster.py -++++++++++ +distributed.py +++++++++++++++ -.. literalinclude:: ../src/itwinai/torch/cluster.py +.. literalinclude:: ../src/itwinai/torch/distributed.py :language: python inference.py @@ -31,10 +31,10 @@ types.py .. literalinclude:: ../src/itwinai/torch/types.py :language: python -utils.py -++++++++ +reproducibility.py +++++++++++++++++++ -.. literalinclude:: ../src/itwinai/torch/utils.py +.. literalinclude:: ../src/itwinai/torch/reproducibility.py :language: python diff --git a/docs/local_setup.rst b/docs/local_setup.rst deleted file mode 100644 index 72b5f377..00000000 --- a/docs/local_setup.rst +++ /dev/null @@ -1,210 +0,0 @@ -.. 💻 Local systems -.. ----------------- - -**Requirements** - -* Linux environment. -* Windows and macOS were never tested. - - -.. toctree:: - :maxdepth: 5 - - -Micromamba installation -+++++++++++++++++++++++ - -To manage Conda environments we use micromamba, a light weight version of conda. - -It is suggested to refer to the `Manual installation guide `_. - -Consider that Micromamba can eat a lot of space when building environments because packages are cached on -the local filesystem after being downloaded. To clear cache you can use `micromamba clean -a`. -Micromamba data are kept under the `$HOME` location. However, in some systems, `$HOME` has a limited storage -space and it would be cleverer to install Micromamba in another location with more storage space. -Thus by changing the `$MAMBA_ROOT_PREFIX` variable. See a complete installation example for Linux below, where the -default `$MAMBA_ROOT_PREFIX` is overridden: - - -.. code-block:: bash - - cd $HOME - - # Download micromamba (This command is for Linux Intel (x86_64) systems. Find the right one for your system!) - curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba - - # Install micromamba in a custom directory - MAMBA_ROOT_PREFIX='my-mamba-root' - ./bin/micromamba shell init $MAMBA_ROOT_PREFIX - - # To invoke micromamba from Makefile, you need to add explicitly to $PATH - echo 'PATH="$(dirname $MAMBA_EXE):$PATH"' >> ~/.bashrc - -**Reference**: `Micromamba installation guide `_. - - -Environment setup -+++++++++++++++++ - -**Requirements:** - -* Linux environment. Windows and macOS were never tested. -* Micromamba: see the installation instructions above. -* VS Code, for development. - -Tensorflow -++++++++++ - -Installation: - -.. code-block:: bash - - # Install TensorFlow 2.13 - make tf-2.13 - - # Activate env - micromamba activate ./.venv-tf - -Other TF versions are available, using the following targets `tf-2.10`, and `tf-2.11`. - - -PyTorch (+ Lightning) -+++++++++++++++++++++ - -Installation: - -.. code-block:: bash - - # Install PyTorch + lightning - make torch-gpu - - # Activate env - micromamba activate ./.venv-pytorch - -Other also CPU-only version is available at the target `torch-cpu`. - - -Development environment -+++++++++++++++++++++++ - -This is for developers only. To have it, update the installed `itwinai` package adding the `dev` extra: - -.. code-block:: bash - - pip install -e .[dev] - - -**Test with `pytest`** -To run tests on itwinai package: - -.. code-block:: bash - - # Activate env - micromamba activate ./.venv-pytorch # or ./.venv-tf - - pytest -v -m "not slurm" tests/ - - -However, some tests are intended to be executed only on an HPC system, where SLURM is available. They are marked with "slurm" tag. To run also those tests, use the dedicated job script: - -.. code-block:: bash - - sbatch tests/slurm_tests_startscript - - # Upon completion, check the output: - cat job.err - cat job.out - - - - -.. Workflow orchestrator -.. +++++++++++++++++++++ - -.. Install the (custom) orchestrator virtual environment. - -.. .. code-block:: bash - -.. source ~/.bashrc -.. # Create local env -.. make - -.. # Activate env -.. micromamba activate ./.venv - -.. To run tests on workflows use: - -.. .. code-block:: bash - -.. # Activate env -.. micromamba activate ./.venv - -.. pytest tests/ - - -.. Development env setup -.. --------------------- - -.. Requirements: - -.. * Linux, macOS environment. Windows was never tested. -.. * Micromamba: see the installation instructions above. -.. * VS Code, for development. - -.. Installation: - -.. .. code-block:: bash - -.. make dev-env - -.. # Activate env -.. micromamba activate ./.venv-dev - -.. To run tests on itwinai package: - -.. .. code-block:: bash - -.. # Activate env -.. micromamba activate ./.venv-dev - -.. pytest tests/ai/ - - -.. AI environment setup -.. -------------------- - -.. Requirements: - -.. * Linux, macOS environment. Windows was never tested. -.. * Micromamba: see the installation instructions above. -.. * VS Code, for development. - -.. **NOTE**: this environment gets automatically setup when a workflow is executed! - -.. However, you can also set it up explicitly with: - -.. .. code-block:: bash - -.. make ai-env - -.. # Activate env -.. micromamba activate ./ai/.venv-pytorch - -.. Updating the environment files -.. ++++++++++++++++++++++++++++++ - -.. The files under `ai/env-files/` are of two categories: - -.. * Simple environment definition, such as `pytorch-env.yml` and `pytorch-env-gpu.yml` -.. * Lockfiles, such as `pytorch-lock.yml` and `pytorch-gpu-lock.yml`, generated by `conda-lock `_. - -.. **When you install the ai environment, install it from the lock file!** - -.. When the "simple" environment file (e.g., `pytorch-env.yml`) changes, lock it with `conda-lock `_: - -.. .. code-block:: bash - -.. micromamba activate ./.venv - -.. make lock-ai - diff --git a/docs/mnist_doc.rst b/docs/mnist_doc.rst index fd26c4e7..71e2610d 100644 --- a/docs/mnist_doc.rst +++ b/docs/mnist_doc.rst @@ -22,12 +22,12 @@ The `dataloader.py` script is responsible for loading the MNIST dataset and prep .. :undoc-members: .. :show-inheritance: -pipeline.yaml -+++++++++++++ +config.yaml ++++++++++++ This YAML file defines the pipeline configuration for the MNIST use case. It includes settings for the model, training, and evaluation. -.. literalinclude:: ../use-cases/mnist/torch-lightning/pipeline.yaml +.. literalinclude:: ../use-cases/mnist/torch-lightning/config.yaml :language: yaml startscript @@ -38,31 +38,6 @@ The `startscript` is a shell script to initiate the training process. It sets up .. literalinclude:: ../use-cases/mnist/torch-lightning/startscript :language: bash -train.py -++++++++ - -This script contains the training loop and is where the model is trained using the data prepared by `dataloader.py`. - -.. literalinclude:: ../use-cases/mnist/torch-lightning/train.py - :language: python - -.. .. automodule:: torch-lightning.train -.. :members: -.. :undoc-members: -.. :show-inheritance: - -trainer.py -++++++++++ - -The `trainer.py` file defines the `Trainer` class which sets up the training parameters and the training process. - -.. literalinclude:: ../use-cases/mnist/torch-lightning/trainer.py - :language: python - -.. .. automodule:: torch-lightning.trainer -.. :members: -.. :undoc-members: -.. :show-inheritance: utils.py ++++++++ @@ -102,13 +77,13 @@ Dockerfile :language: bash -inference-pipeline.yaml -+++++++++++++++++++++++ +create_inference_sample.py +++++++++++++++++++++++++++ -This YAML file defines the pipeline configuration for the MNIST use case inference. +This file defines a pipeline configuration for the MNIST use case inference. -.. literalinclude:: ../use-cases/mnist/torch/inference-pipeline.yaml - :language: yaml +.. literalinclude:: ../use-cases/mnist/torch/create_inference_sample.py + :language: python model.py ++++++++ @@ -118,12 +93,12 @@ The `model.py` script is responsible for loading a simple model. .. literalinclude:: ../use-cases/mnist/torch/model.py :language: python -pipeline.yaml -+++++++++++++ +config.yaml ++++++++++++ This YAML file defines the pipeline configuration for the MNIST use case. It includes settings for the model, training, and evaluation. -.. literalinclude:: ../use-cases/mnist/torch/pipeline.yaml +.. literalinclude:: ../use-cases/mnist/torch/config.yaml :language: yaml startscript @@ -134,13 +109,6 @@ The `startscript` is a shell script to initiate the training process. It sets up .. literalinclude:: ../use-cases/mnist/torch/startscript :language: bash -train.py -++++++++ - -This script contains the training loop and is where the model is trained using the data prepared by `dataloader.py`. - -.. literalinclude:: ../use-cases/mnist/torch/train.py - :language: python saver.py ++++++++ @@ -149,3 +117,64 @@ saver.py .. literalinclude:: ../use-cases/mnist/torch/saver.py :language: python + +runall.sh ++++++++++ + +.. literalinclude:: ../use-cases/mnist/torch/runall.sh + :language: bash + + +slurm.sh +++++++++ + +.. literalinclude:: ../use-cases/mnist/torch/slurm.sh + :language: bash + + + +This section covers the MNIST use case, which utilizes the `tensorflow` framework for training and evaluation. The following files are integral to this use case: + +tensorflow +---------- + +.. toctree:: + :maxdepth: 5 + +dataloader.py ++++++++++++++ + +The `dataloader.py` script is responsible for loading the MNIST dataset and preparing it for training. + +.. literalinclude:: ../use-cases/mnist/tensorflow/dataloader.py + :language: python + + +pipeline.yaml ++++++++++++++ + +This YAML file defines the pipeline configuration for the MNIST use case. It includes settings for the model, training, and evaluation. + +.. literalinclude:: ../use-cases/mnist/tensorflow/pipeline.yaml + :language: yaml + + +startscript ++++++++++++ + +The `startscript` is a shell script to initiate the training process. It sets up the environment and starts the training using the `train.py` script. + +.. literalinclude:: ../use-cases/mnist/tensorflow/startscript + :language: bash + + +trainer.py +++++++++++ + +The `trainer.py` script is responsible for configuring the training process. + +.. literalinclude:: ../use-cases/mnist/tensorflow/dataloader.py + :language: python + + + \ No newline at end of file diff --git a/docs/modules.rst b/docs/modules.rst index a9e96c97..06b018b6 100644 --- a/docs/modules.rst +++ b/docs/modules.rst @@ -1,12 +1,12 @@ -itwinai -======= +`itwinai `_ +============================================== .. toctree:: :maxdepth: 4 itwinai.cli - itwinai.cluster itwinai.components + itwinai.distributed itwinai.loggers itwinai.parser itwinai.pipeline