-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config manager supports command line overrides #69
Conversation
Summary: PR implements following enhancements to config manager. 1. Command line args and toml file args are now unified. 2. Defaults can be loaded from either. 3. Defaults can be overridden through command line. Overrides will be applied irrespective of where the defaults were loaded from. Test Plan: ============================= test session starts ============================== platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache rootdir: /data/users/gnadathur/a/torchtrain configfile: pyproject.toml plugins: cov-4.1.0 collecting ... collected 5 items test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%] test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%] test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%] test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%] test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%] ---------- coverage: platform linux, python 3.10.13-final-0 ---------- Coverage XML written to file coverage.xml ============================= slowest 20 durations ============================= 0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override ============================== 5 passed in 0.10s =============================== Successful 10 iterations llama.sh run Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: PR implements following enhancements to config manager. 1. Command line args and toml file args are now unified. 2. Defaults can be loaded from either. 3. Defaults can be overridden through command line. Overrides will be applied irrespective of where the defaults were loaded from. Test Plan: ============================= test session starts ============================== platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache rootdir: /data/users/gnadathur/a/torchtrain configfile: pyproject.toml plugins: cov-4.1.0 collecting ... collected 5 items test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%] test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%] test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%] test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%] test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%] ---------- coverage: platform linux, python 3.10.13-final-0 ---------- Coverage XML written to file coverage.xml ============================= slowest 20 durations ============================= 0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override ============================== 5 passed in 0.10s =============================== Successful 10 iterations llama.sh run Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9a0f588849f520652a431df438318be3ab0f9578 Pull Request resolved: #69
Summary: PR implements following enhancements to config manager. 1. Command line args and toml file args are now unified. 2. Defaults can be loaded from either. 3. Defaults can be overridden through command line. Overrides will be applied irrespective of where the defaults were loaded from. Test Plan: ============================= test session starts ============================== platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache rootdir: /data/users/gnadathur/a/torchtrain configfile: pyproject.toml plugins: cov-4.1.0 collecting ... collected 5 items test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%] test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%] test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%] test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%] test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%] ---------- coverage: platform linux, python 3.10.13-final-0 ---------- Coverage XML written to file coverage.xml ============================= slowest 20 durations ============================= 0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override ============================== 5 passed in 0.10s =============================== Successful 10 iterations llama.sh run Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: PR implements following enhancements to config manager. 1. Command line args and toml file args are now unified. 2. Defaults can be loaded from either. 3. Defaults can be overridden through command line. Overrides will be applied irrespective of where the defaults were loaded from. Test Plan: ============================= test session starts ============================== platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache rootdir: /data/users/gnadathur/a/torchtrain configfile: pyproject.toml plugins: cov-4.1.0 collecting ... collected 5 items test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%] test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%] test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%] test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%] test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%] ---------- coverage: platform linux, python 3.10.13-final-0 ---------- Coverage XML written to file coverage.xml ============================= slowest 20 durations ============================= 0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override ============================== 5 passed in 0.10s =============================== Successful 10 iterations llama.sh run Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9a0f588849f520652a431df438318be3ab0f9578 Pull Request resolved: #69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great to me! have a few suggestions inlined
|
||
args = parser.parse_args() | ||
main(args) | ||
config = JobConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should define a config_from_cli
method in the config_manager module, and do things like config = config_from_cli(JobConfig())
@@ -15,7 +15,7 @@ log_freq = 10 | |||
|
|||
[model] | |||
name = "llama" | |||
model_conf = "debugmodel" | |||
config = "debugmodel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call it type
to avoid confusion with JobConfig?
Semantics: | ||
- Default config is loaded from a toml file. If no toml file is provided, | ||
then the default config is loaded from argparse defaults. | ||
- Then, Override config is loaded from command line arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either accept a config file, or a full set of arguments, wondering do you think override config would be something helpful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume exclusive either or, that simplifies implementation. There are a couple of use cases to consider:
-
Do we want the ability to use a base config file for a model say LLaMa-13b and submit a bunch of jobs with say different batch size overridden from command line ?
-
For using it with internal environments, would we be able to pass the full set of arguments through command line or do we want to leverage command line defaults and pass overrides only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So regarding the cases:
- I think often the time if I want to submit a job with a slight different config, I just modify the config file locally, and the torchx launcher is able to pick up the changes, we can also create separate configs for such a case
- for internal using, torchx launcher would also do the same thing, load the config options from a config file, and launch with those config options, and for the case where the config is not specified in the config file, the command line defaults is valuable there.
return True | ||
|
||
def _init_args(self) -> Tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can put both: arg parser initialization, and arg parsing into this function, call it sth like parse_args
? what I think would be nice in train.py:
config = JobConfig()
config.parse_args(args[1:])
main(config)
class_type = type(k.title(), (), v) | ||
setattr(self, k, class_type()) | ||
self._validate_config() | ||
default_args, override_args = self._init_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function can happen inside parse_args
?
Stack from ghstack (oldest at bottom):
Summary:
PR implements following enhancements to config manager.
be applied irrespective of where the defaults were loaded from.
Test Plan:
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python
cachedir: .pytest_cache
rootdir: /data/users/gnadathur/a/torchtrain
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 5 items
test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%]
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%]
test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%]
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%]
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%]
---------- coverage: platform linux, python 3.10.13-final-0 ----------
Coverage XML written to file coverage.xml
============================= slowest 20 durations =============================
0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
============================== 5 passed in 0.10s ===============================
Successful 10 iterations llama.sh run
Reviewers:
Subscribers:
Tasks:
Tags: