Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable integration test between optimizer-in-backward and pp #793

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mori360
Copy link
Contributor

@mori360 mori360 commented Jan 16, 2025

Optimizer-in-backward would free gradients memory during backward, causing integration test failure with pp at gradient scale
Disable test with pp first, would enable later with support to multi schedule pp
Add test with dp, tp, cp, hsdp

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 16, 2025
@@ -81,30 +81,37 @@ def __init__(
) -> None:
self.optimizers = []
self.model_parts = model_parts
optim_dict = {}
Copy link
Contributor Author

@mori360 mori360 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect all optims in optim_dict, avoid bugs with only valid hooks at the last self.model_parts if len(self.model_parts)>1 (for future support of multi schedule pp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for optim dict lgtm

@mori360 mori360 requested review from wconstab and tianyu-l January 16, 2025 02:08
@mori360 mori360 marked this pull request as ready for review January 16, 2025 02:16
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@@ -127,6 +134,10 @@ def build_optimizers(
step() and zero_grad() method for all the child optimizers.
"""
optim_in_bwd = job_config.optimizer.early_step_in_backward
if optim_in_bwd and job_config.experimental.pipeline_parallel_degree > 1:
raise NotImplementedError(
"Optimizers in backward is not supported with pipeline parallelism."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Not yet supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants