-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability test wall clock #239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I just realized that it would be nice to enable/disable all scalability analyses from a flag passed to the Trainer's constructor. This would give the possibility to the user to decide when profiling should actually be done and when to have less overhead |
The way I had it in mind was that the decorator was going to be used as a toggle for this. The obvious problem with doing that is of course that you have to actually change the source code every time you want to do it. However, if you add a flag to the constructor that overrides the decorator then you in a sense have two contradicting toggles, which to me sounds quite confusing. To counteract this, I would suggest then that the decorators are a permanent part of every trainer, somehow, so that we limit the number of places for configuration, as anything else quickly becomes convoluted. |
* add gpu utilization decorator and begin work on plots * add decorator for gpu energy utilization * Added config option to hpo script, styling (#235) * Update README.md * Update README.md * Update createEnvVega.sh * remove unused dist file * run black and isort to fix linting errors * temporary changes * remove redundant variable * add absolute time plot * remove trailing whitespace * remove redundant variable * remove trailing whitespace * begin implementation of backup * fix issues from PR * fix issues from PR * add backup to gpu monitoring * fix import in eurac trainer * cleanup backup mechanism slightly * fix linting errors * update logging directory and pattern * update default pattern for gpu energy plots * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * fix import in eurac trainer * fix linting errors * update logging directory and pattern * update default pattern for gpu energy plots * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * begin implementation of backup * add backup to gpu monitoring * add backup functionality to communication plot * rewrite epochtimetracker and refactor scalability plot code * cleanup scalability plot code * updating some epochtimetracker dependencies * add configurable and dynamic wait and warmup times for the profiler * temporary changes * add absolute time plot * begin implementation of backup * add backup to gpu monitoring * cleanup backup mechanism slightly * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * begin implementation of backup * add backup functionality to communication plot * rewrite epochtimetracker and refactor scalability plot code * cleanup scalability plot code * updating some epochtimetracker dependencies * fix linting errors * fix more linting errors * add utilization percentage plot * run isort for linting * update default save path for metrics * add decorators to virgo and some cleanup * add contributions and cleanup * fix linting errors * change 'credits' to 'credit' * update communication plot style * update function names * update scalability function for a more streamlined approach * run isort * move horovod import * fix linting errors * add contributors --------- Co-authored-by: Anna Lappe <[email protected]> Co-authored-by: Matteo Bunino <[email protected]>
Summary
Update the implementation for the original scalability test to include an absolute plot. In addition, many changes have been made to streamline the different scalability tests.
Note: As I have been rebasing from the
gpu-monitoring
branch, many of the changes from that branch will appear in this PR until the other PR has been merged.Related issue :
#221