Skip to content

Commit

Permalink
Usecase eurac (#219)
Browse files Browse the repository at this point in the history
* Backend (#59)

* WIP: Tensorflow MNIST use-case

* UPDATE: Tensorflow MNIST version

* ADD: Backend

* ADD: Use-case init

* FIX: Paths and downloading of the data

* FIX: Paths and downloading of the data

* ADD: Setup, Config update

* ADD: Setup, Config update

* UPDATE: File movement into itwinai

* FIX: Move utils from tensorflow to global folder

* FIX: Add setup into torch Executable

* ADD: MNIST Torch Use-case

* FIX: Formatting

* ADD: Lib

* ADD: Lib

* ADD: Tests, Fix Loggers

* Update README.md

* ADD: Tests

* ADD: MLCC

* ADD: Cyclones, Cyclones-pipe

* ADD: TensorflowTrainer

* UPDATE: Move TensorflowTrainer into Backend

* FIX: Dependencies

* ADD: Number of devices

* ADD: initial version of TorchTrainer

* update

* update

* ADD: distributed torch Trainer and decorator

* ADD: New version of torch distribtued trainer and tests

* ADD: load torch dist trainer form config file

* ADD: multi-gpu pytorch trainer

* ADD: download on login node

* FIX: dataloaders in Trainer

* FIX: add dataloaders into trainer

* FIX: clear load and save state

* ADD: Loggers

* FIX: Log in a distributed environment

* TensorFlow backend (#63)

* UPDATE: Remove experimental distribution

* ADD: Mnist distributed

* ADD: Optional strategy

* UPDATE: Conditional distribution

* FIX: Dataloader for mnist

* FIX: Model cloning lambda function for distributed scope

* ADD: CycleGAN

* UPDATE: Types

* UPDATE: Types

* ADD: Local distr

* FIX: learning rates

* ADD: CycleGAN distributed

* FIX: Reduction

* FIX: Distribution

* ADD: tmp.py

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* UPDATE: Executors

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD:Initial VIRGO

* UPDATE: Optional distribution, tensorflow-gpu

* UPDATE: tensorflow-gpu dependency

* ADD: Unify branches

---------

Co-authored-by: User3574 <[email protected]>

* Refacto entire code base

* ADD: workflows folder

* FIX: refactor

* FIX: linting

* ADD: how to run use case doc

* ADD: workflows doc

* FIX: MD linter

* Pipe MNIST lightning (#86)

* ADD: lightning distributed + pipeline

* UPDATE: jscpd threshold

* UPDATE: super linter ignore use cases

* ADD: jscpd ignore loggers

* Functional tests for MNIST (#87)

* ADD: use case tests

* FIX: move use case models out of itwinai

* FIX: rearrange modules

* ADD: ConsoleLogger and LoggersCollection

* FIX: loggers filter

* FIX: add TF env creation

* UPDATE: test flag

* ADD: early pytest on slurm

* FIX: duplicated code in TF Trainer

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* 3dgan use case (#94)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Sqaaas code (#96)

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

* Update sqaaas.yml

* ADD: adaptive branch discovery for SQAaaS actin

* Trigger only on main and dev branches

* ADD: double quote

* Trigger pytest only on main and dev PRs

* Torch mnist inference (#95)

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* Remove keras dependency

* 3dgan integration (#97)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* 3dgan integration (#98)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

* Update readme

* update README

* FIX typo

* Update README

* Update mkdir

* UPDATE data paths

* UPDATE Dockerfile

* UPDATE Dockerfiles

* UPDATE for Singularity execution

* FIX version mismatch

* UPDATE Singularity docs

* Named steps pipe (#100)

* ADD: dict steps pipe

* Relax dependency constraint

* UPDATE Singularity exec command

* UPDATE: Image version

* UPDATE: load components from pipeline

* ADD: docs

* Simplify 3DGAN model config

* ADD: mlflow autologging support for PL trainer

* UPDATE container info

* Refactor

* UPDATE dependencies

* FIX linter problem

* Simplified workflow configuration (#108)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

---------

Co-authored-by: orviz <[email protected]>

* Simplified workflow configuration (#109)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

* ADD slurm jobscript

* FIX merge error

* FIX components template

---------

Co-authored-by: orviz <[email protected]>

* ADD integration tests

* FIX test

* FIX 3dgan inference test

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* fixed distributed trainer in cyclones use case

* 3dgan integration (#118)

* fixed distributed trainer in cyclones use case

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

* Update readme

* update README

* FIX typo

* Update README

* Update mkdir

* UPDATE data paths

* UPDATE Dockerfile

* UPDATE Dockerfiles

* UPDATE for Singularity execution

* FIX version mismatch

* UPDATE Singularity docs

* Named steps pipe (#100)

* ADD: dict steps pipe

* Relax dependency constraint

* UPDATE Singularity exec command

* UPDATE: Image version

* UPDATE: load components from pipeline

* ADD: docs

* Simplify 3DGAN model config

* ADD: mlflow autologging support for PL trainer

* UPDATE container info

* Refactor

* UPDATE dependencies

* FIX linter problem

* Simplified workflow configuration (#108)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

---------

Co-authored-by: orviz <[email protected]>

* Simplified workflow configuration (#109)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

* ADD slurm jobscript

* FIX merge error

* FIX components template

---------

Co-authored-by: orviz <[email protected]>

* ADD integration tests

* FIX test

* FIX 3dgan inference test

* ADD GPU support and update tag

* FIX linter

* ADD override example

* UPDATE 3DGAN inference

* UPDATE inference execution tutorials

* UPDATE README

* UPDATE saver saving sparse tensors

* ADD interlink pods

* UPDATE pod name

* UPDATE annotations

* FIX README

* CLEANUP

* Merge

* update

* ADD tf cpu env

* U[date Makefile

* FIX 3DGAN tests

* FIX data folder path

---------

Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Unit test 4 dev (#113)

* Define a step for pytest execution

* Fix: use v1 of step action

* Print result of step composition

* Rename step

* Use step previous definition in the assessment

* Rename input: workflow -> steps

* Avoid caching by using 1.0.0

* Set container image

* Bump to v1

* Bump to sqaaas-assessment-action@v2

* Remove 'id' property

* Adapt inputs to v2

* Remove current branch

* Disable test_cyclones_train_tf

* ADD marker

* ADD skip memory heavy

* Disable for PRs

---------

Co-authored-by: Matteo Bunino <[email protected]>

* Distributed strategy launcher (#117)

* ADD: distrib launcher mockup

* REFACTOR: cluster env, strategy and launcher

* ADD: Torch Elastic Launcher

* ADD: info on env vars

* ADD: distributed tooling and examples

* new folder

* UPDATE: distributed strategy setup

* generalized for DDP and DS

* add config file

* UPDATE: kwargs

* Update general_trainer.py

* Update general_startscript

* Update general_trainer.py

* UPDATE .gitignore

* Update distrib strategy

* UPDATE torch distributed strategy classes

* Updated docstrings

* Small fixes

* UPDATE docstrings

* ADD deepespeed config loader

* ADD first deepspeed tutorial draft

* UPDATE DDP Dp distrib strategy

* UPDATE horovod strategy

* UPDATE tutorial on torch distributed strategies

* UPDATE torch strategies tutorial

* Update createEnvJSC.sh

* Update hvd_slurm.sh

* Update README.md

* UPDATE distributed tutorial

* Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0

* Fixes to deepspeed startscript

* Update distributed.py

* Update trainer.py

* UPDATE tutorial

* ADD draft MNIST tutorial

* UPDATE DDP tutorial for MNIST

* FIX small details

* Update distributed.py

* Added TF tutorials

* Fixes to tutorials

* Add files via upload

* Update Makefile

* Update README.md

* UPDATE tutorials

* UPDATE documentation and improve explainability

* UPDATE SLURM scripts

* FIX local rank mismatch

* fixed distributed trainer in cyclones use case

* UPDATE launcher

* UPDATE linter

* UPDATE format

* FIX linter

* FIX linter

* Update workflow

* UPDATE workflow

* update

* Update workflow

* UPDATE super linter to v6

* UPDATE super linter to v6.3.0

* UPDATE super linter to slim

* Cleanup

* Update tfmirrored_slurm.sh

* Update tfmirrored_slurm.sh

* REMOVE workflows legacy

* DELETE cyclegan use case

* UPDATE dist training tutorials torch

* RENAME folders with torch

* DRAFT torch imagenet tutorial

* UPDATE configuration

* UPDATE imagenet tutorial

* DRAFT scaling test

* ADD scaling analysis report

* FIX deepspeed micro batchsize

* UPDATE data path

* UPDATE checkpoint to avoid race conditions

* UPDATE scalability report

* UPDATE dataset path

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* Update README.md

* Update README.md

* JUBE benchmarks

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* ADD logy scale option

* Extract JUBE tutorial

* CLEANUP baselines

* Log epoch time in real-time

* FIX deepspeed dataloader for potential performances improvement

* UPDATE SC bash severity

* FIX deepspeed and horovod trainers

* FIX some code checks

* Unify redundant SLURM job scripts and configuration files

* CLEANUP unused configuration

* Reorg configurations

* Refactor configurations and add documentation

* Update README

* ADD report image

* Improve plot resolution

* UPDATE scaling test

* UPDATE  launcher scripts

* FIX linter

* REMOVE jube tutorial

---------

Co-authored-by: Mario Rüttgers <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>

* Distributed strategy launcher (#127)

Update ParseConfig

* Distributed strategy launcher (#128)

Remove experimental files

* Docs dev (#132)

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* fixed distributed trainer in cyclones use case

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* testing automated docs update

* updating getting started page

* fixing pages and adding new content

* bug fixes

* fixing content rendering

* latest fixes in rendering

* Add version feature to docs

* Update .readthedocs.yaml

* fixing display structure in getting started page

* new fixes similar to previous commit

* Update index.rst

* Update index.rst

Text re-edit index

* Update index.rst

change 1 word

* Update .readthedocs.yaml

* Update .readthedocs.yaml

* fixing getting started page

* Text review getting_started_with_itwinai.rst

* Update 3dgan_doc.rst

* Update getting_started_with_itwinai.rst

punctuation

* Fix torch naming problem

---------

Co-authored-by: KalliopiTsolaki <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: VerderK <[email protected]>

* Distributed strategy launcher (#131)

* ADD: distrib launcher mockup

* REFACTOR: cluster env, strategy and launcher

* ADD: Torch Elastic Launcher

* ADD: info on env vars

* ADD: distributed tooling and examples

* new folder

* UPDATE: distributed strategy setup

* generalized for DDP and DS

* add config file

* UPDATE: kwargs

* Update general_trainer.py

* Update general_startscript

* Update general_trainer.py

* UPDATE .gitignore

* Update distrib strategy

* UPDATE torch distributed strategy classes

* Updated docstrings

* Small fixes

* UPDATE docstrings

* ADD deepespeed config loader

* ADD first deepspeed tutorial draft

* UPDATE DDP Dp distrib strategy

* UPDATE horovod strategy

* UPDATE tutorial on torch distributed strategies

* UPDATE torch strategies tutorial

* Update createEnvJSC.sh

* Update hvd_slurm.sh

* Update README.md

* UPDATE distributed tutorial

* Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0

* Fixes to deepspeed startscript

* Update distributed.py

* Update trainer.py

* UPDATE tutorial

* ADD draft MNIST tutorial

* UPDATE DDP tutorial for MNIST

* FIX small details

* Update distributed.py

* Added TF tutorials

* Fixes to tutorials

* Add files via upload

* Update Makefile

* Update README.md

* UPDATE tutorials

* UPDATE documentation and improve explainability

* UPDATE SLURM scripts

* FIX local rank mismatch

* fixed distributed trainer in cyclones use case

* UPDATE launcher

* UPDATE linter

* UPDATE format

* FIX linter

* FIX linter

* Update workflow

* UPDATE workflow

* update

* Update workflow

* UPDATE super linter to v6

* UPDATE super linter to v6.3.0

* UPDATE super linter to slim

* Cleanup

* Update tfmirrored_slurm.sh

* Update tfmirrored_slurm.sh

* REMOVE workflows legacy

* DELETE cyclegan use case

* UPDATE dist training tutorials torch

* RENAME folders with torch

* DRAFT torch imagenet tutorial

* UPDATE configuration

* UPDATE imagenet tutorial

* DRAFT scaling test

* ADD scaling analysis report

* FIX deepspeed micro batchsize

* UPDATE data path

* UPDATE checkpoint to avoid race conditions

* UPDATE scalability report

* UPDATE dataset path

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* Update README.md

* Update README.md

* JUBE benchmarks

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* ADD logy scale option

* Extract JUBE tutorial

* CLEANUP baselines

* Log epoch time in real-time

* FIX deepspeed dataloader for potential performances improvement

* UPDATE SC bash severity

* FIX deepspeed and horovod trainers

* FIX some code checks

* Unify redundant SLURM job scripts and configuration files

* CLEANUP unused configuration

* Reorg configurations

* Refactor configurations and add documentation

* Update README

* ADD report image

* Improve plot resolution

* UPDATE scaling test

* UPDATE  launcher scripts

* FIX linter

* REMOVE jube tutorial

* Restore ConfigParser

* FIX type hinting

* ADD dev dependencies

* REMOVE experimental scripts

* UPDATE scaling report

* Add SLURM logs

* Refactor log scale

* Update scalability report

* Unify SLURM logs per job

* Update README.md

* Update README.md

* Update README.md

* ADD itwinai installation

* UPDATE torch distributed tutorial 0

* UPDATE torch distributed tutorials

* REMOVE imagenet tutorial

* ADD NonDistributedStrategy and create_dataloader method

* CLEANUP older classes

* Rename strategies

* Simplify structure

* ADD draft new torch trainer class

* UPDATED torch trainer draft

* UPDATE MNIST use case

* INtegrate new trainer into MNIST use case

* UPDATE structure: remove unused files and refactor tests

* Tmp disable unused tests

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* FIX failing inference

* Functiona tests (#133)

* UPDATE tests

* FIX errors

* CLEANUP

* Remove unused workflow

---------

Co-authored-by: Mario Rüttgers <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>

* 3dgan integration (#134)

* fixed distributed trainer in cyclones use case

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

* Update readme

* update README

* FIX typo

* Update README

* Update mkdir

* UPDATE data paths

* UPDATE Dockerfile

* UPDATE Dockerfiles

* UPDATE for Singularity execution

* FIX version mismatch

* UPDATE Singularity docs

* Named steps pipe (#100)

* ADD: dict steps pipe

* Relax dependency constraint

* UPDATE Singularity exec command

* UPDATE: Image version

* UPDATE: load components from pipeline

* ADD: docs

* Simplify 3DGAN model config

* ADD: mlflow autologging support for PL trainer

* UPDATE container info

* Refactor

* UPDATE dependencies

* FIX linter problem

* Simplified workflow configuration (#108)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

---------

Co-authored-by: orviz <[email protected]>

* Simplified workflow configuration (#109)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

* ADD slurm jobscript

* FIX merge error

* FIX components template

---------

Co-authored-by: orviz <[email protected]>

* ADD integration tests

* FIX test

* FIX 3dgan inference test

* ADD GPU support and update tag

* FIX linter

* ADD override example

* UPDATE 3DGAN inference

* UPDATE inference execution tutorials

* UPDATE README

* UPDATE saver saving sparse tensors

* ADD interlink pods

* UPDATE pod name

* UPDATE annotations

* FIX README

* CLEANUP

* Merge

* update

* ADD tf cpu env

* U[date Makefile

* FIX 3DGAN tests

* FIX data folder path

* ADD offloading of 3DGAN training

* ADAPT 3DGAN training for singularity execution

* UPDATE test and fix linter

---------

Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Docs dev (#135)

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* fixed distributed trainer in cyclones use case

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* testing automated docs update

* updating getting started page

* fixing pages and adding new content

* bug fixes

* fixing content rendering

* latest fixes in rendering

* Add version feature to docs

* Update .readthedocs.yaml

* fixing display structure in getting started page

* new fixes similar to previous commit

* Update index.rst

* Update index.rst

Text re-edit index

* Update index.rst

change 1 word

* Update .readthedocs.yaml

* Update .readthedocs.yaml

* fixing getting started page

* Text review getting_started_with_itwinai.rst

* Update 3dgan_doc.rst

* Update getting_started_with_itwinai.rst

punctuation

* Fix torch naming problem

* UPDATE requirements

---------

Co-authored-by: KalliopiTsolaki <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: VerderK <[email protected]>

* Distributed strategy launcher (#137)

* ADD: distrib launcher mockup

* REFACTOR: cluster env, strategy and launcher

* ADD: Torch Elastic Launcher

* ADD: info on env vars

* ADD: distributed tooling and examples

* new folder

* UPDATE: distributed strategy setup

* generalized for DDP and DS

* add config file

* UPDATE: kwargs

* Update general_trainer.py

* Update general_startscript

* Update general_trainer.py

* UPDATE .gitignore

* Update distrib strategy

* UPDATE torch distributed strategy classes

* Updated docstrings

* Small fixes

* UPDATE docstrings

* ADD deepespeed config loader

* ADD first deepspeed tutorial draft

* UPDATE DDP Dp distrib strategy

* UPDATE horovod strategy

* UPDATE tutorial on torch distributed strategies

* UPDATE torch strategies tutorial

* Update createEnvJSC.sh

* Update hvd_slurm.sh

* Update README.md

* UPDATE distributed tutorial

* Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0

* Fixes to deepspeed startscript

* Update distributed.py

* Update trainer.py

* UPDATE tutorial

* ADD draft MNIST tutorial

* UPDATE DDP tutorial for MNIST

* FIX small details

* Update distributed.py

* Added TF tutorials

* Fixes to tutorials

* Add files via upload

* Update Makefile

* Update README.md

* UPDATE tutorials

* UPDATE documentation and improve explainability

* UPDATE SLURM scripts

* FIX local rank mismatch

* fixed distributed trainer in cyclones use case

* UPDATE launcher

* UPDATE linter

* UPDATE format

* FIX linter

* FIX linter

* Update workflow

* UPDATE workflow

* update

* Update workflow

* UPDATE super linter to v6

* UPDATE super linter to v6.3.0

* UPDATE super linter to slim

* Cleanup

* Update tfmirrored_slurm.sh

* Update tfmirrored_slurm.sh

* REMOVE workflows legacy

* DELETE cyclegan use case

* UPDATE dist training tutorials torch

* RENAME folders with torch

* DRAFT torch imagenet tutorial

* UPDATE configuration

* UPDATE imagenet tutorial

* DRAFT scaling test

* ADD scaling analysis report

* FIX deepspeed micro batchsize

* UPDATE data path

* UPDATE checkpoint to avoid race conditions

* UPDATE scalability report

* UPDATE dataset path

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* Update README.md

* Update README.md

* JUBE benchmarks

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* ADD logy scale option

* Extract JUBE tutorial

* CLEANUP baselines

* Log epoch time in real-time

* FIX deepspeed dataloader for potential performances improvement

* UPDATE SC bash severity

* FIX deepspeed and horovod trainers

* FIX some code checks

* Unify redundant SLURM job scripts and configuration files

* CLEANUP unused configuration

* Reorg configurations

* Refactor configurations and add documentation

* Update README

* ADD report image

* Improve plot resolution

* UPDATE scaling test

* UPDATE  launcher scripts

* FIX linter

* REMOVE jube tutorial

* Restore ConfigParser

* FIX type hinting

* ADD dev dependencies

* REMOVE experimental scripts

* UPDATE scaling report

* Add SLURM logs

* Refactor log scale

* Update scalability report

* Unify SLURM logs per job

* Update README.md

* Update README.md

* Update README.md

* ADD itwinai installation

* UPDATE torch distributed tutorial 0

* UPDATE torch distributed tutorials

* REMOVE imagenet tutorial

* ADD NonDistributedStrategy and create_dataloader method

* CLEANUP older classes

* Rename strategies

* Simplify structure

* ADD draft new torch trainer class

* UPDATED torch trainer draft

* UPDATE MNIST use case

* INtegrate new trainer into MNIST use case

* UPDATE structure: remove unused files and refactor tests

* Tmp disable unused tests

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* FIX failing inference

* Functiona tests (#133)

* UPDATE tests

* FIX errors

* CLEANUP

* Remove unused workflow

* Fixes to TF new version errors

* Fixes to TF new version errors

* Fixes to TF new version errors

* Fixes to TF new version errors

* Update distributed.py

* Update tfmirrored_slurm.sh

* Update train.py

* TF updates

* Add README

* Python venv (#136)

* Move to python venv

* Update Makefile

* Add Horovod installation

* Update env

* FIX openmpi install

* Add TF explicit version

* UPDATE env creation

* REMOVE constraint on torch 2.0.*

* UPDATE installation

* FIX test

* REMOVE strict dependency on micromamba

* FIX docs and debugging states

* FIX cpu only installation

* FIX deepspeed cpu installation

* FIX tf env creation

* FIX makefile

* ADD pypi deployment

* DISABLE push debug

* UPDATE pypi

* UPDATE classifiers

* Update pyproject.toml

---------

Co-authored-by: Mario Rüttgers <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>

* Update README.md

* Distributed strategy launcher (#141)

* ADD: distrib launcher mockup

* REFACTOR: cluster env, strategy and launcher

* ADD: Torch Elastic Launcher

* ADD: info on env vars

* ADD: distributed tooling and examples

* new folder

* UPDATE: distributed strategy setup

* generalized for DDP and DS

* add config file

* UPDATE: kwargs

* Update general_trainer.py

* Update general_startscript

* Update general_trainer.py

* UPDATE .gitignore

* Update distrib strategy

* UPDATE torch distributed strategy classes

* Updated docstrings

* Small fixes

* UPDATE docstrings

* ADD deepespeed config loader

* ADD first deepspeed tutorial draft

* UPDATE DDP Dp distrib strategy

* UPDATE horovod strategy

* UPDATE tutorial on torch distributed strategies

* UPDATE torch strategies tutorial

* Update createEnvJSC.sh

* Update hvd_slurm.sh

* Update README.md

* UPDATE distributed tutorial

* Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0

* Fixes to deepspeed startscript

* Update distributed.py

* Update trainer.py

* UPDATE tutorial

* ADD draft MNIST tutorial

* UPDATE DDP tutorial for MNIST

* FIX small details

* Update distributed.py

* Added TF tutorials

* Fixes to tutorials

* Add files via upload

* Update Makefile

* Update README.md

* UPDATE tutorials

* UPDATE documentation and improve explainability

* UPDATE SLURM scripts

* FIX local rank mismatch

* fixed distributed trainer in cyclones use case

* UPDATE launcher

* UPDATE linter

* UPDATE format

* FIX linter

* FIX linter

* Update workflow

* UPDATE workflow

* update

* Update workflow

* UPDATE super linter to v6

* UPDATE super linter to v6.3.0

* UPDATE super linter to slim

* Cleanup

* Update tfmirrored_slurm.sh

* Update tfmirrored_slurm.sh

* REMOVE workflows legacy

* DELETE cyclegan use case

* UPDATE dist training tutorials torch

* RENAME folders with torch

* DRAFT torch imagenet tutorial

* UPDATE configuration

* UPDATE imagenet tutorial

* DRAFT scaling test

* ADD scaling analysis report

* FIX deepspeed micro batchsize

* UPDATE data path

* UPDATE checkpoint to avoid race conditions

* UPDATE scalability report

* UPDATE dataset path

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* Update README.md

* Update README.md

* JUBE benchmarks

* Update createEnvJSC.sh

* Update createEnvJSCTF.sh

* ADD logy scale option

* Extract JUBE tutorial

* CLEANUP baselines

* Log epoch time in real-time

* FIX deepspeed dataloader for potential performances improvement

* UPDATE SC bash severity

* FIX deepspeed and horovod trainers

* FIX some code checks

* Unify redundant SLURM job scripts and configuration files

* CLEANUP unused configuration

* Reorg configurations

* Refactor configurations and add documentation

* Update README

* ADD report image

* Improve plot resolution

* UPDATE scaling test

* UPDATE  launcher scripts

* FIX linter

* REMOVE jube tutorial

* Restore ConfigParser

* FIX type hinting

* ADD dev dependencies

* REMOVE experimental scripts

* UPDATE scaling report

* Add SLURM logs

* Refactor log scale

* Update scalability report

* Unify SLURM logs per job

* Update README.md

* Update README.md

* Update README.md

* ADD itwinai installation

* UPDATE torch distributed tutorial 0

* UPDATE torch distributed tutorials

* REMOVE imagenet tutorial

* ADD NonDistributedStrategy and create_dataloader method

* CLEANUP older classes

* Rename strategies

* Simplify structure

* ADD draft new torch trainer class

* UPDATED torch trainer draft

* UPDATE MNIST use case

* INtegrate new trainer into MNIST use case

* UPDATE structure: remove unused files and refactor tests

* Tmp disable unused tests

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* Update action

* FIX failing inference

* Functiona tests (#133)

* UPDATE tests

* FIX errors

* CLEANUP

* Remove unused workflow

* Fixes to TF new version errors

* Fixes to TF new version errors

* Fixes to TF new version errors

* Fixes to TF new version errors

* Update distributed.py

* Update tfmirrored_slurm.sh

* Update train.py

* TF updates

* Add README

* Python venv (#136)

* Move to python venv

* Update Makefile

* Add Horovod installation

* Update env

* FIX openmpi install

* Add TF explicit version

* UPDATE env creation

* REMOVE constraint on torch 2.0.*

* UPDATE installation

* FIX test

* REMOVE strict dependency on micromamba

* FIX docs and debugging states

* FIX cpu only installation

* FIX deepspeed cpu installation

* FIX tf env creation

* FIX makefile

* ADD pypi deployment

* DISABLE push debug

* UPDATE pypi

* UPDATE classifiers

* Update pyproject.toml

* Update README.md

* Cyclone tf dist (#130)

* get_stretegy

* UPDATE distributed strategy

* change req file

* cycline tf dist

* small bugs

* fix bug in train.py

* REFACTOR cyclones use case

* Activate pytest

* NEW TensorFlow trainer

* ADD user information

---------

Co-authored-by: ruettgers1 <[email protected]>
Co-authored-by: Matteo Bunino <[email protected]>

* Interactive distrib ml (#139)

Add examples for distributed ml in interactive mode

* Interactive distrib ml (#140)

Update tutorial

* Disable documentation GH action

* Remove action

---------

Co-authored-by: Mario Rüttgers <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: r-sarma <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: MarioRuettgers <[email protected]>

* Merge main (#142)

Bring changes on main into dev

* Virgo integration (#143)

* ADD Virgo data pipeline and some refactoring

* FIX typo

* UPDATE README

* ADD training

* ADD TrainingConfiguration

* ADD distributed training and refactor

* update readme

* UPDATE loggers and add tests

* Refactor

* FIX typo

* UPDATE use cases instructions

* ADD checkpointing and refactor.

* FIX linter

* FIX jscpd

* FIX jscpd

* Disable jscpd

* Refactor loggers

* ADD loggers to Virgo use case

* Update AUTHORS.md

* Update AUTHORS.md

* Docs dev (#144)

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* fixed distributed trainer in cyclones use case

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* commiting docs functionality for testing deployment

* adding documentation deployment relevant files

* updating readthedocs.yaml

* changing directory of requirements.txt

* updating reqs file

* commiting changes and adding pages for tutorials

* adding installation instructions in docs

* adding latest changes to docs

* adding new pages for itwinai modules and other modifications

* modified src/itwinai/torch directory name to solve namespace conflict

* fixing tutorial sections

* fixes in pages appearance

* fixing rendering bugs

* fixing pages appearance bugs

* adding latest modifications

* Deleted duplicate folder after renaming src/itwinai/torch

* adding documentation.yml file for automatic updating on github pages

* modifying documentation.yml file

* updating reqs file to solve bug in deployment

* testing automated docs update

* updating getting started page

* fixing pages and adding new content

* bug fixes

* fixing content rendering

* latest fixes in rendering

* Add version feature to docs

* Update .readthedocs.yaml

* fixing display structure in getting started page

* new fixes similar to previous commit

* Update index.rst

* Update index.rst

Text re-edit index

* Update index.rst

change 1 word

* Update .readthedocs.yaml

* Update .readthedocs.yaml

* fixing getting started page

* Text review getting_started_with_itwinai.rst

* Update 3dgan_doc.rst

* Update getting_started_with_itwinai.rst

punctuation

* Fix torch naming problem

* UPDATE requirements

* Remove unnecessary dependencies

* Add docstring

* adding latest changes from dev

* new content and changes

* Update index.rst

toctree revise

* adding pages for distributed ml tutorials

* new shpinx reqs to solve build failing

* Docs update:
- python code format fixed
- added brief explanation on ddp in new section

* requirements changed

* UPDATE requirements

* UPDATE requirements and itwinai.types

* ADD CMake and GCC installation

* UPDATE CMake and GCC installation

* UPDATE CMake and GCC installation

* ADD notebooks

* Disable notebooks section

* FIX TOC

* Saving local changes before pulling from remote

* saving updates before pull from origin

* Update itwinai.torch.modules.rst

* Update itwinai.torch.modules.rst

* Update itwinai.torch.modules.rst

* Update itwinai.torch.modules.rst

* adding cyclones and virgo use cases pages

* FIX build errors

* Update TOC

* Update TOC

---------

Co-authored-by: KalliopiTsolaki <[email protected]>
Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: VerderK <[email protected]>
Co-authored-by: Killian Verder <[email protected]>

* Update dev (#152)

* Dev - itwinai 0.0.2 (#138)

* Backend (#59)

* WIP: Tensorflow MNIST use-case

* UPDATE: Tensorflow MNIST version

* ADD: Backend

* ADD: Use-case init

* FIX: Paths and downloading of the data

* FIX: Paths and downloading of the data

* ADD: Setup, Config update

* ADD: Setup, Config update

* UPDATE: File movement into itwinai

* FIX: Move utils from tensorflow to global folder

* FIX: Add setup into torch Executable

* ADD: MNIST Torch Use-case

* FIX: Formatting

* ADD: Lib

* ADD: Lib

* ADD: Tests, Fix Loggers

* Update README.md

* ADD: Tests

* ADD: MLCC

* ADD: Cyclones, Cyclones-pipe

* ADD: TensorflowTrainer

* UPDATE: Move TensorflowTrainer into Backend

* FIX: Dependencies

* ADD: Number of devices

* ADD: initial version of TorchTrainer

* update

* update

* ADD: distributed torch Trainer and decorator

* ADD: New version of torch distribtued trainer and tests

* ADD: load torch dist trainer form config file

* ADD: multi-gpu pytorch trainer

* ADD: download on login node

* FIX: dataloaders in Trainer

* FIX: add dataloaders into trainer

* FIX: clear load and save state

* ADD: Loggers

* FIX: Log in a distributed environment

* TensorFlow backend (#63)

* UPDATE: Remove experimental distribution

* ADD: Mnist distributed

* ADD: Optional strategy

* UPDATE: Conditional distribution

* FIX: Dataloader for mnist

* FIX: Model cloning lambda function for distributed scope

* ADD: CycleGAN

* UPDATE: Types

* UPDATE: Types

* ADD: Local distr

* FIX: learning rates

* ADD: CycleGAN distributed

* FIX: Reduction

* FIX: Distribution

* ADD: tmp.py

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* FIX: Distribution

* UPDATE: Executors

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* FIX: Distributed Dataset

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD: Ray

* ADD:Initial VIRGO

* UPDATE: Optional distribution, tensorflow-gpu

* UPDATE: tensorflow-gpu dependency

* ADD: Unify branches

---------

Co-authored-by: User3574 <[email protected]>

* Refacto entire code base

* ADD: workflows folder

* FIX: refactor

* FIX: linting

* ADD: how to run use case doc

* ADD: workflows doc

* FIX: MD linter

* Pipe MNIST lightning (#86)

* ADD: lightning distributed + pipeline

* UPDATE: jscpd threshold

* UPDATE: super linter ignore use cases

* ADD: jscpd ignore loggers

* Functional tests for MNIST (#87)

* ADD: use case tests

* FIX: move use case models out of itwinai

* FIX: rearrange modules

* ADD: ConsoleLogger and LoggersCollection

* FIX: loggers filter

* FIX: add TF env creation

* UPDATE: test flag

* ADD: early pytest on slurm

* FIX: duplicated code in TF Trainer

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* 3dgan use case (#94)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Sqaaas code (#96)

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

* Update sqaaas.yml

* ADD: adaptive branch discovery for SQAaaS actin

* Trigger only on main and dev branches

* ADD: double quote

* Trigger pytest only on main and dev PRs

* Torch mnist inference (#95)

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* Remove keras dependency

* 3dgan integration (#97)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* 3dgan integration (#98)

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* REMOVE: keras dependency

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

* Update readme

* update README

* FIX typo

* Update README

* Update mkdir

* UPDATE data paths

* UPDATE Dockerfile

* UPDATE Dockerfiles

* UPDATE for Singularity execution

* FIX version mismatch

* UPDATE Singularity docs

* Named steps pipe (#100)

* ADD: dict steps pipe

* Relax dependency constraint

* UPDATE Singularity exec command

* UPDATE: Image version

* UPDATE: load components from pipeline

* ADD: docs

* Simplify 3DGAN model config

* ADD: mlflow autologging support for PL trainer

* UPDATE container info

* Refactor

* UPDATE dependencies

* FIX linter problem

* Simplified workflow configuration (#108)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

---------

Co-authored-by: orviz <[email protected]>

* Simplified workflow configuration (#109)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

* ADD slurm jobscript

* FIX merge error

* FIX components template

---------

Co-authored-by: orviz <[email protected]>

* ADD integration tests

* FIX test

* FIX 3dgan inference test

---------

Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* fixed distributed trainer in cyclones use case

* 3dgan integration (#118)

* fixed distributed trainer in cyclones use case

* commiting integration of 3dgan scripts

* ADD: Download dataset

* FIX: DDP distributed training with manual optimization

* ADD: log with MLFlow

* Sqaaas code (#88)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

---------

Co-authored-by: orviz <[email protected]>

* Sqaaas code (#89)

* Create sqaaas.yml

* Update sqaaas.yml

* Update sqaaas.yml

* Point to the current repo

* Remove unnecessary checkout step

* Rename step

* ADD: adaptive branch discovery for SQAaaS action

* Update sqaaas.yml

---------

Co-authored-by: orviz <[email protected]>

* ADD: draft predictor and saver

* ADD: stub for inference pipeline

* ADD: small docs

* UPDATE: inference pipeline components

* UPDATE: reorg

* ADD: image generation for inference

* update tag

* ADD: threshold

* ADD: draft inference

* ADD: draft inference wf

* ADD: working inference workflow

* ADD: 3D scatter plots

* ADD: Dockerfile + refactor

* ADD: .dockerignore

* Update .dockerignore

* ADD: skip download option

* ADD: cern pipeline.yaml

* UPDATE: dataset loading function

* UPDATE: dataset loading function

* UPDATE conf

* UPDATE refactor

* UPDATE refactor

* UPDATE training docs

* Update readme

* update README

* FIX typo

* Update README

* Update mkdir

* UPDATE data paths

* UPDATE Dockerfile

* UPDATE Dockerfiles

* UPDATE for Singularity execution

* FIX version mismatch

* UPDATE Singularity docs

* Named steps pipe (#100)

* ADD: dict steps pipe

* Relax dependency constraint

* UPDATE Singularity exec command

* UPDATE: Image version

* UPDATE: load components from pipeline

* ADD: docs

* Simplify 3DGAN model config

* ADD: mlflow autologging support for PL trainer

* UPDATE container info

* Refactor

* UPDATE dependencies

* FIX linter problem

* Simplified workflow configuration (#108)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

---------

Co-authored-by: orviz <[email protected]>

* Simplified workflow configuration (#109)

* Add SQAaaS dynamic badge for dev branch (#104)

* Add SQAaaS dynamic badge

* Upgrade to sqaaas-assessment-action@v2

* Add draft example

* UPDATE credits field

* ADD docs

* REFACTOR components and pipeline code

* UPDATE docstring

* UPDATE mnist torch uc

* ADD config file parser draft

* ADD itwinaiCLI and ConfigParser

* ADD docs

* ADD pipeline parser and serializer plus tests

* UPDATE docs

* ADD adapter component and tests (incl parser)

* ADD splitter component, improve pipeline, tests

* UPDATE test

* REMOVE todos

* ADD component tests

* ADD serializer tests

* FIX linter

* ADD basic workflow tutorial

* ADD basic intermediate tutorial

* ADD advanced tutorial

* UPDATE advanced tutorial

* UPDATE use cases

* UPDATE save parameters

* FIX linter

* FIX cyclones use case workflow

* ADD slurm jobscript

* FIX merge error

* FIX components template

---------

Co-authored-by: orviz <[email protected]>

* ADD integration tests

* FIX test

* FIX 3dgan inference test

* ADD GPU support and update tag

* FIX linter

* ADD override example

* UPDATE 3DGAN inference

* UPDATE inference execution tutorials

* UPDATE README

* UPDATE saver saving sparse tensors

* ADD interlink pods

* UPDATE pod name

* UPDATE annotations

* FIX README

* CLEANUP

* Merge

* update

* ADD tf cpu env

* U[date Makefile

* FIX 3DGAN tests

* FIX data folder path

---------

Co-authored-by: zoechbauer1 <[email protected]>
Co-authored-by: Kalliopi Tsolaki <[email protected]>
Co-authored-by: orviz <[email protected]>

* Unit test 4 dev (#113)

* Define a step for pytest execution

* Fix: use v1 of step action

* Print result of step composition

* Rename step

* Use step previous definition in the assessment

* Rename input: workflow -> steps

* Avoid caching by using 1.0.0

* Set container image

* Bump to v1

* Bump to sqaaas-assessment-action@v2

* Remove 'id' property

* Adapt inputs to v2

* Remove current branch

* Disable test_cyclones_train_tf

* ADD marker

* ADD skip memory heavy

* Disable for PRs

---------

Co-authored-by: Matteo Bunino <[email protected]>

* Distributed strategy launcher (#117)

* ADD: distrib launcher mockup

* REFACTOR: cluster env, strategy and launcher

* ADD: Torch Elastic Launcher

* ADD: info on env vars

* ADD: distributed to…
  • Loading branch information
26 people committed Oct 16, 2024
1 parent ead6bdf commit bb28f35
Showing 0 changed files with 0 additions and 0 deletions.

0 comments on commit bb28f35

Please sign in to comment.