-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mlflow benchmark profiler update #38
Open
anaprietonem
wants to merge
41
commits into
develop
Choose a base branch
from
mlflow_benchmark_profiler_update
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 35 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
6507424
fix: saving frequency bug for inference checkpoints
anaprietonem 7d2d620
Merge branch 'develop' into 257-bug-inference-checkpoints-saving-freq…
anaprietonem 0027046
chore: update CHANGELOG
anaprietonem 8cf698b
feat: add anemoi profiler with mlflow compatibility
anaprietonem d647bf9
fix: format error
anaprietonem 352cd29
fix: removed atos path from noteook and fixed update_paths function
anaprietonem c7ab208
add hta functionality in documentation
anaprietonem ebe33bd
updating docs for profiler
anaprietonem 9c67f3e
update profiler docs
anaprietonem 2bcf957
update profiler docs
anaprietonem 2e6a168
update profiler docs
anaprietonem 29232ce
update profiler docs
anaprietonem c646e38
update profiler docs
anaprietonem 4d9610b
update profiler docs
anaprietonem 0a4070c
update profiler docs
anaprietonem 45e7a7b
update profiler docs
anaprietonem 3cea9d9
update profiler docs
anaprietonem 3c2f2d9
update profiler docs
anaprietonem b8fcf99
update profiler docs
anaprietonem 80e5522
Merge branch 'develop' into mlflow_benchmark_profiler_update
anaprietonem 990aea9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5aeeca4
fixing pre-commits on docs
anaprietonem b85eac2
fix pre-commit docs
anaprietonem ef54ffb
fix pre-commit docs
anaprietonem 56e222f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4aa225a
minor updates
anaprietonem 81b57d8
Merge branch 'mlflow_benchmark_profiler_update' of github.com:ecmwf/a…
anaprietonem 86e58ba
added docs for anemoi profiler
anaprietonem e943782
add section about profiling in overview
anaprietonem e177bd6
add section about profiling in overview
anaprietonem 328ca19
add comment to avoid confussion with profiler for troubleshooting
anaprietonem 702287e
added note about limit batches
anaprietonem 36dc645
Merge branch 'develop' into mlflow_benchmark_profiler_update
anaprietonem a7280ab
updated changelog
anaprietonem 05289e4
making sure anemoi-training profiler commands works in interactive gp…
anaprietonem df76686
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 977c3e4
update docs
anaprietonem d71d7c1
Merge branch 'mlflow_benchmark_profiler_update' of github.com:ecmwf/a…
anaprietonem 442dd9a
removed comment based on refactor callbacks PR
anaprietonem 60368ae
adapted patchedProfile to not break HTA
anaprietonem 9c50023
avoid code duplication in commands and fix copyright notice
anaprietonem File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# (C) Copyright 2024 ECMWF. | ||
# | ||
# This software is licensed under the terms of the Apache Licence Version 2.0 | ||
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. | ||
# In applying this licence, ECMWF does not waive the privileges and immunities | ||
# granted to it by virtue of its status as an intergovernmental organisation | ||
# nor does it submit to any jurisdiction. | ||
|
||
from __future__ import annotations | ||
|
||
import logging | ||
import os | ||
import sys | ||
from pathlib import Path | ||
from typing import TYPE_CHECKING | ||
|
||
from anemoi.training.commands import Command | ||
|
||
if TYPE_CHECKING: | ||
import argparse | ||
|
||
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
class Profiler(Command): | ||
"""Commands to profile Anemoi models.""" | ||
|
||
accept_unknown_args = True | ||
|
||
@staticmethod | ||
def add_arguments(parser: argparse.ArgumentParser) -> argparse.ArgumentParser: | ||
return parser | ||
|
||
def run(self, args: list[str], unknown_args: list[str] | None = None) -> None: | ||
# This will be picked up by the logger | ||
os.environ["ANEMOI_PROFILER_CMD"] = f"{sys.argv[0]} {args.command}" | ||
# Merge the known subcommands with a non-whitespace character for hydra | ||
new_sysargv = self._merge_sysargv(args) | ||
|
||
# Add the unknown arguments (belonging to hydra) to sys.argv | ||
if unknown_args is not None: | ||
sys.argv = [new_sysargv, *unknown_args] | ||
else: | ||
sys.argv = [new_sysargv] | ||
|
||
# Import and run the profiler command | ||
LOGGER.info("Running anemoi profiling command with overrides: %s", sys.argv[1:]) | ||
main() | ||
|
||
def _merge_sysargv(self, args: argparse.Namespace) -> str: | ||
"""Merge the sys.argv with the known subcommands to pass to hydra. | ||
|
||
Parameters | ||
---------- | ||
args : argparse.Namespace | ||
args from the command line | ||
|
||
Returns | ||
------- | ||
str | ||
Modified sys.argv as string | ||
""" | ||
argv = Path(sys.argv[0]) | ||
|
||
# this will turn "/env/bin/anemoi-training train" into "/env/bin/.anemoi-training-train" | ||
# the dot at the beginning is intentional to not interfere with autocomplete | ||
modified_sysargv = argv.with_name(f".{argv.name}-{args.command}") | ||
|
||
if hasattr(args, "subcommand"): | ||
modified_sysargv += f"-{args.subcommand}" | ||
return str(modified_sysargv) | ||
|
||
|
||
def main() -> None: | ||
# Use the environment variable to check if main is being called from the subcommand, not from the ddp entrypoint | ||
if not os.environ.get("ANEMOI_PROFILER_CMD"): | ||
error = "This entrypoint should not be called directly. Use `anemoi-training profiler` instead." | ||
raise RuntimeError(error) | ||
|
||
from anemoi.training.train.profiler import main as anemoi_profiler | ||
|
||
anemoi_profiler() | ||
|
||
|
||
command = Profiler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a specific one for the profiler? I think we can just reuse the ANEMOI_TRAINING_CMD env var?
The "training" in that name doesn't need to refer to "train". It could just be "the command that anemoi-training was run with".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I wanted to check that! I first opted to have the two of them just to check if it's was working fine, which it does! Right now there is also quite a bit of repeated code across the profiler and train command. So I was thinking I could directly inherit from Train to do the Profiler one to avoid repeating the _merge_sysargv and other functions? setting the command as an env variable could even go to a small function so then if I inherit I don't need to code it again. What do you think? (I have not looked a lot to the details of the Command class, so would like to check thoughts in inheritance could be okey or is not advised in this case)