-
Notifications
You must be signed in to change notification settings - Fork 9
1st AtmoRep core developers Meeting
iluise edited this page Aug 5, 2024
·
1 revision
#HClimRep core developer meeting, 2024-07-29
Participants: Christian, Ilaria, Kacper, Asma, Nishant, Enxhi, Ankit, Asma, Simon, Nikolay, Julius, Sindhu, Martin
- meeting schedule: bi-weekly
- consolidated and cleaned-up code to be shared today
- monthly roadmap meetings for feature development; maybe together with monthly HClimRep progress meeting
- minutes will be shared in the Wiki
- code development meetings should involve one responsible (point of contact) from every partner/institution
- working branch: develop-branch of repo
- general guidelines
- all discussion should take place in issues on github
- individual partners are free to subscribe to issues
- workflow: first start an issue, then set-up branches, and finalize with pull request
- code guidelines
- all partners will be added as developer
- follow style of existing code (close to Google style-guide), e.g. two spaces for indentation
- Pylint and flake8 to enforce code-style in the near future
- installation of code with
setup.py
- README describes set-up with virtual environment
- contact Ilaria & Christian for conda support
- each partner: try to run code within this week
- git guidelines
- two stable, protected versions/branches:
- main
- develop
- merging only via
pull request
-> Christian and Ilaria (Simon in the future) will grant them - set-up issues for new developments
- always set-up two branches when you start developing
- naming convention for branches
-
<username>/<short description>/orig
-> the original branch w.r.t. you develop your fix or feature, i.e. original checkout as reference, don't modify during development (used to ease merging) -
<username>/<short description>/head
-> Branch where your actual development takes place (to be modified)
-
- naming convention for branches
- two stable, protected versions/branches:
- issues
- use templates (bug, feature etc.)
- currently: two templates supported for bug and feature development
- issues descriptions should be descriptive
- cross-link issues and pull requests
⚠️ don't close issues by yourself!⚠️ - issue reviews every 2-3 months
- use of github-projects in the future
- pull requests (PR)
- needs approvement by Ilaria, Christian (and Simon)
- should be as small as possible and only related for single issues/small number of issues
- use labels, but don't clutter the list for PR reviewer
- tag Christian for core model issues, Ilaria for analysis/evaluation
- everyone is encouraged to verify PRs!
- Local testing
- testing interface is currently set-up; Ilaria is working on it
- currently: validation with pytest
- future tests focus on data pipeline and training (non-trivial)
- CI/CD pipeline will be set-up in the future
- will likely require GPU-resources
- current checks:
- time-step validation of BERT-samples against original data
- check underlying coordinates
- initial check on evaluation metrics (MSE-based)
- Simon will support set-up of testing/ testing pipeline
- notes on AtmoRep model
- Torch DDP used for distributed training, no torch-lightening
- full model training: 32-64 GPUs for 3-4 weeks training, smaller model configuration possible (quicker convergence)
- support of processing large-scale output -> current issue
- ideally with support of xarray
- should be based on dask to handle data exceeding available memory
- rigorous usage of classes/modularization for code development to be targeted?
- test-suite in the future to track training progress
- atmorep core model and analysis code should be kept separated from each other, but should follow the same rules
- expected input data format for AtmoRep
- all input data has to be processed into zarr-files
- zarr files accessible through:
- download via datapub: https://datapub.fz-juelich.de/atmorep/zarr_files/
- meteocloud: /p/data1/slmet/met_data/ecmwf/era5/zarr, in order to access the files through meteocloud, you should ask for access to the slmet data project through judoor.
- current issue:
- chunck-size of zarr-files is not optimal for performance (inode-restrictions at JSC)
- possibility: set-up virtual file system to avoid issues with inodes
- zarr files accessible through:
- final fixes by Wednesday
- homework for all partners until next core developer meeting:
- all partners should provide their github-username to Christian so that he can add everyone to the github-project
- Start utilising the
atmo-rep
-project for running the code - Open issues to indicate what you will be working on
The AtmoRep Collaboration - last update: April 2024