Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu monitoring #237

Merged
merged 22 commits into from
Nov 7, 2024
Merged

Gpu monitoring #237

merged 22 commits into from
Nov 7, 2024

Conversation

jarlsondre
Copy link
Collaborator

@jarlsondre jarlsondre commented Oct 28, 2024

Summary

Add decorator and CLI command for measuring GPU utilization. You can now add the decorator to any execute() function of a TorchTrainer class and it will log the files in a folder named utilization_logs. Then, when the training is finished, you can create a plot using the itwinai generate-gpu-energy-plot.

An example of a resulting plot (real data from training on EURAC) can be seen here:
image


Related issue :
#221

@jarlsondre jarlsondre added the enhancement New feature or request label Oct 28, 2024
@jarlsondre jarlsondre self-assigned this Oct 28, 2024
src/itwinai/torch/monitoring/monitoring.py Outdated Show resolved Hide resolved
src/itwinai/torch/monitoring/monitoring.py Outdated Show resolved Hide resolved
src/itwinai/torch/monitoring/monitoring.py Outdated Show resolved Hide resolved
src/itwinai/torch/monitoring/monitoring.py Outdated Show resolved Hide resolved
use-cases/eurac/trainer.py Show resolved Hide resolved
@matbun matbun self-requested a review October 31, 2024 18:10
annaelisalappe
annaelisalappe previously approved these changes Nov 4, 2024
@matbun matbun self-requested a review November 4, 2024 14:33
@matbun
Copy link
Collaborator

matbun commented Nov 4, 2024

LGTM

@jarlsondre jarlsondre merged commit 06bf43b into main Nov 7, 2024
11 checks passed
@jarlsondre jarlsondre deleted the gpu-monitoring branch November 7, 2024 15:02
jarlsondre added a commit that referenced this pull request Nov 8, 2024
* add gpu utilization decorator and begin work on plots

* add decorator for gpu energy utilization

* Added config option to hpo script, styling (#235)

* Update README.md

* Update README.md

* Update createEnvVega.sh

* remove unused dist file

* run black and isort to fix linting errors

* remove redundant variable

* remove trailing whitespace

* fix issues from PR

* fix import in eurac trainer

* fix linting errors

* update logging directory and pattern

* update default pattern for gpu energy plots

* fix isort linting

* add support for none pattern and general cleanup

* fix linting errors with black and isort

* add configurable and dynamic wait and warmup times for the profiler

* remove old plot

* move horovod import

* fix linting errors

---------

Co-authored-by: Anna Lappe <[email protected]>
Co-authored-by: Matteo Bunino <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants