Disable integration test between optimizer-in-backward and pp #793

mori360 · 2025-01-16T01:38:23Z

Optimizer-in-backward would free gradients memory during backward, causing integration test failure with pp at gradient scale
Disable test with pp first, would enable later with support to multi schedule pp
Add test with dp, tp, cp, hsdp

…est with dp, tp, hsdp, cp

mori360 · 2025-01-16T01:40:01Z

torchtitan/optimizer.py

@@ -81,30 +81,37 @@ def __init__(
    ) -> None:
        self.optimizers = []
        self.model_parts = model_parts
+        optim_dict = {}


collect all optims in optim_dict, avoid bugs with only valid hooks at the last self.model_parts if len(self.model_parts)>1 (for future support of multi schedule pp)

The fix for optim dict lgtm

tianyu-l

lgtm!

wconstab · 2025-01-16T05:20:18Z

torchtitan/optimizer.py

@@ -127,6 +134,10 @@ def build_optimizers(
    step() and zero_grad() method for all the child optimizers.
    """
    optim_in_bwd = job_config.optimizer.early_step_in_backward
+    if optim_in_bwd and job_config.experimental.pipeline_parallel_degree > 1:
+        raise NotImplementedError(
+            "Optimizers in backward is not supported with pipeline parallelism."


Nit:
Not yet supported

disable integration test between optimizer-in-backward with pp, add t…

5fa1c4a

…est with dp, tp, hsdp, cp

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 16, 2025

mori360 commented Jan 16, 2025

View reviewed changes

change NotImplementedError message

84488f6

mori360 requested review from wconstab and tianyu-l January 16, 2025 02:08

mori360 marked this pull request as ready for review January 16, 2025 02:16

tianyu-l approved these changes Jan 16, 2025

View reviewed changes

wconstab approved these changes Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable integration test between optimizer-in-backward and pp #793

Disable integration test between optimizer-in-backward and pp #793

mori360 commented Jan 16, 2025 •

edited

Loading

mori360 Jan 16, 2025 •

edited

Loading

wconstab Jan 16, 2025

tianyu-l left a comment

wconstab Jan 16, 2025

Disable integration test between optimizer-in-backward and pp #793

Are you sure you want to change the base?

Disable integration test between optimizer-in-backward and pp #793

Conversation

mori360 commented Jan 16, 2025 • edited Loading

mori360 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

wconstab Jan 16, 2025

Choose a reason for hiding this comment

tianyu-l left a comment

Choose a reason for hiding this comment

wconstab Jan 16, 2025

Choose a reason for hiding this comment

mori360 commented Jan 16, 2025 •

edited

Loading

mori360 Jan 16, 2025 •

edited

Loading