Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pipeline Parallel (and 2D PP+FSDP) support #318

Merged
merged 36 commits into from
May 21, 2024

Conversation

wconstab
Copy link
Contributor

@wconstab wconstab commented May 9, 2024

Stack from ghstack (oldest at bottom):

runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend. Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use yet.

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 9, 2024
wconstab added a commit that referenced this pull request May 9, 2024
ghstack-source-id: 318831bda97c4639b745832e7f33f0b616a479ee
Pull Request resolved: #318
torchtitan/config_manager.py Show resolved Hide resolved
torchtitan/config_manager.py Show resolved Hide resolved
wconstab added 2 commits May 9, 2024 17:03
[ghstack-poisoned]
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 10, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: e49b659e66f4101cef58ad717a80521f5b172347
Pull Request resolved: #318
@wconstab wconstab changed the title Mock up a PP config UX Add Pipeline Parallel (and 2D PP+FSDP) support May 10, 2024
wconstab added a commit that referenced this pull request May 10, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: e49b659e66f4101cef58ad717a80521f5b172347
Pull Request resolved: #318
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 10, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: ee49541296ccea6a32a2f17cbc9a6c892216fde4
Pull Request resolved: #318
model_config.vocab_size, input_shape, dtype=torch.int64, device=device
)

# HACK- can't use shape inference via execution of the PP stage inside ManualPipelineStage API, becuase the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposal to land this part as-is. We need to add the lazy shape inference part to manual stage api, but i think we can clean this up after it lands.

[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 11, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: 073ec20f5dfb6f323ce829275f349224bcfff40b
Pull Request resolved: #318
train.py Outdated Show resolved Hide resolved
train.py Outdated Show resolved Hide resolved
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 11, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: ad748435e833ef1e0e4c8e424339532daadf8bee
Pull Request resolved: #318
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 11, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: 06f8ed615894662b70c37dbb69a0f82c976a5009
Pull Request resolved: #318
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 11, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: 9dd9b7ab4c8548757351671dfd63dd338e7cf7ab
Pull Request resolved: #318
train.py Outdated Show resolved Hide resolved
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 14, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: f5bd75a194f06879a8fcc3e8ffd34f7824bcfc7f
Pull Request resolved: #318
wconstab added a commit that referenced this pull request May 17, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d73eb57829322e9909c5e9b541f4c3c65311f4e2
Pull Request resolved: #318
Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Is there a conclusion with model.reshard()?

torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/pipelining_utils.py Outdated Show resolved Hide resolved
[ghstack-poisoned]
@wconstab wconstab mentioned this pull request May 17, 2024
[ghstack-poisoned]
@wconstab wconstab mentioned this pull request May 18, 2024
[ghstack-poisoned]
wconstab added 3 commits May 20, 2024 11:33
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonablely good to me, have some additional comments inlined.

torchtitan/config_manager.py Outdated Show resolved Hide resolved
torchtitan/config_manager.py Outdated Show resolved Hide resolved
if parallel_dims.dp_enabled
else torch.float32,
device=device,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we encapsulate L209-L258 into a input_output_shape_inference function and put that into pipeline_utils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think those lines are llama-model specific? thats why i kept them here. (The plan is to get rid of these still by providing a runtime shape inference thing, but it might not be the highest priority).

test_runner.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 21, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: 765946ac4e0969c08da8d1e722f15fc94eecde34
Pull Request resolved: #318
[ghstack-poisoned]
wconstab added a commit that referenced this pull request May 21, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: #318
@wconstab wconstab merged commit 0179b4b into gh/wconstab/12/base May 21, 2024
4 checks passed
wconstab added a commit that referenced this pull request May 21, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: #318
@wconstab wconstab deleted the gh/wconstab/12/head branch May 21, 2024 23:27
@wanchaol
Copy link
Contributor

@wconstab looks like CI is failing now, is it because the APIs for PP not in nightly yet? If so we should probably wait until the nightly is there and then reland this

tianyu-l pushed a commit that referenced this pull request May 28, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: #318
tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: pytorch#318
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: 7a1b6ea024726bc7bf2430854c8088b77ff4e29e
Pull Request resolved: #318
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: #318
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d02cc759b34ff53035ac05f6121f0bac255b99fe
Pull Request resolved: #318
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d02cc759b34ff53035ac05f6121f0bac255b99fe
Pull Request resolved: #318
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: e07b70e49c35445e3f0565f8b9c2038b4c96afd4
Pull Request resolved: #318
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save

Supports only simple schedules currently, gpipe and 1f1b.

Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.

  e.g. user can specifiy "layers.2,layers.4" as split points.

Currently uses manual frontend by default, but allows specifying
tracer frontend.  Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use  yet.

ghstack-source-id: d7e0a1342bc97d6f1bba9e647234d90688ad708f
Pull Request resolved: pytorch#318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants