-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1D PP tracer test #362
Conversation
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 4a8ad78dd63a3c6cda80e34c6f1ad0d6cb955f9d Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR [ghstack-poisoned]
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 6eff83d7fe5af576dc6da0dcdae5bc51b4ac8ec4 Pull Request resolved: #362
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #362 * __->__ #371 This PR fixes the issue mentioned [here](pytorch/pytorch#126653 (comment)): "Module object has no attributed items." The reason is, a split `ModuleDict` is no longer a `ModuleDict`. It would be more generally applicable if we use `named_children()` and `register_module()` to access and update submodules.
forgot to enable tracer for tracer test in the last PR [ghstack-poisoned]
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 94c84cf90aa77f7620b32988e389dcb05a8098f3 Pull Request resolved: #362
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a small change:
In tracer mode, we don't require users to provide manual split points, because that is being taken care of by the pipeline_llama_tracer
function today:
torchtitan/torchtitan/parallelisms/parallelize_llama.py
Lines 272 to 276 in 54b5fa2
layers_per_rank = len(model.layers) // parallel_dims.pp | |
split_spec = { | |
f"layers.{i * layers_per_rank}": SplitPoint.BEGINNING | |
for i in range(1, parallel_dims.pp) | |
} |
CI should pass after pytorch/pytorch#127607 is landed. |
forgot to enable tracer for tracer test in the last PR [ghstack-poisoned]
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 57c680cb3615acba902984ba969a021f02477d38 Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR [ghstack-poisoned]
I think this is not the right way to do this. If we want the layer-split to be 'automatic', i think it should be automatic for both frontends, and we can delete the _split_points cmdline arg. or if we want to have the cmdline arg, we should keep it behaving the same for both frontends. I'd propose to first keep the arg for both frontends, and then do a PR that makes the cmdline arg optional and uses automation for both PRs. |
Sounds good to me |
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 63f458eeb0e225e6dd43fd53249c37a5099245a8 Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 64664d32e2a7414cd0133dd7082740304d06a347 Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: fc9da54f5c105cc14927db85e97ab8c9de6b04e4 Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: f31ecca90bab6745e7fc2802c56b5c9796a857ae Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 1cb137911f88daa47b57757346dad55aa736429e Pull Request resolved: #362
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approve
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 1cb137911f88daa47b57757346dad55aa736429e Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 1cb137911f88daa47b57757346dad55aa736429e Pull Request resolved: pytorch#362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 57c680cb3615acba902984ba969a021f02477d38 Pull Request resolved: #362
forgot to enable tracer for tracer test in the last PR ghstack-source-id: 1cb137911f88daa47b57757346dad55aa736429e Pull Request resolved: pytorch#362
Stack from ghstack (oldest at bottom):
forgot to enable tracer for tracer test in the last PR