-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic integration test template #104
Conversation
# TorchTrain Config.toml | ||
[job] | ||
dump_folder = "./outputs" | ||
description = "2DParallel with debug model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we call it 2D test here, shall we enable sp/pp below besides FSDP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried SP but its crashing so renamed the test to 1D. Once we fix SP, we can add another test for 2D.
@@ -37,7 +37,7 @@ jobs: | |||
python -m pip install -r requirements.txt | |||
python -m pip install -r dev-requirements.txt | |||
python -m pip install -e . | |||
- name: Run NGPU=4 ./run_llama_train.sh | |||
run: NGPU=4 ./run_llama_train.sh | |||
- name: Run NGPU=4 ./integration_test/run.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a good reason to make a new .sh for testing purposes, then disregard this. but it has some value to test the actual run_llama_train.sh
script that users will see, so we might as well just call that script from the integration tests.
Note that you can override the .toml while calling that script- e.g.
CONFIG_FILE=blah ./run_llama_train.sh
and in that way achieve various different test combos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, same comment about the .toml files themselves.
If we are offering .toml files for various model configs, we might as well just test those directly (except for the ones that are too big, etc).
I am ok with having separate files in integration_test/test_configs/
folder when they are not duplicating a config we have elsewhere, but if we end up with very similar files in 2 places we should probably delete the one in integration_test/test_configs/
and just directly test the one in the other config folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question.
IMO, there can be some key differences between config files in the test directory and the ones in train_config.
- Train configs can be tailored to the optimal config we recommend for a particular model. These are the "public" configs. For example, the number of steps can be what is needed to show good numerics.
- Test configs on the other hand can be short but also granular, (for ex w/ and w/o SAC) to test the various combinations.
Agreed on the comment about de-duping configs.
Here's a scheme that can maybe work better.
- Create
train_configs/test
for all the test configs. train_configs
directory can be used for public configs.- There is no duplication bet. config files in root directory and the test sub directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we want to have a separate test related scripts, we can possibly create scripts/test/
and put all things under that folder
if we want to test different configs, similar we can have train_configs/test/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this scheme. We add a job level config job.use_for_integration_test
which is False
by default. For those configs which we want to add to integration test, we can set this option to True
. This way nothing changes w.r.t location of configs etc. The runner can check for this flag and run the config.
Summary:
Create a test config and runner for integration test. Run it as part of CI.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: