Unflatten traced module #954

kwen2501 · 2024-02-28T03:29:33Z

Description

Move tracer from _export_to_torch_ir to the official torch.export.export
Add unflatten utils (from torch/export/unflatten.py) to unflatten each stage module

Purpose of this PR is to:

be composable with FSDP and TP, which requires structured FQNs like a.b.c to submodules to specify their policies.
be nice to DCP which would not like to see change of FQNs compared to original model.
retire use of _export_to_torch_ir per Export Team's plan.

Test

Added test_transformer.py.

class TransformerLike(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = torch.nn.Sequential(
            *[
                MLPModule(d_hid)
                for _ in range(n_layers)
            ]
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.layers(x)

We split the model into two stages. Each stages would preserve the layers.<i> structure as in the original model.

Stage 0: 
 GraphModule(
  (layers): InterpreterModule(
    (0): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (1): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (2): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (3): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
  )
)

Stage 1: 
 GraphModule(
  (layers): InterpreterModule(
    (4): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (5): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (6): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
    (7): InterpreterModule(
      (net1): InterpreterModule()
      (relu): InterpreterModule()
      (net2): InterpreterModule()
    )
  )
)

Caveat:
I temporarily disabled multi-use parameter support (aka. shared paramters or tied parameters). So some real examples may break. Will add the support back in next PR.

wconstab · 2024-03-01T18:30:13Z

test/test_autosplit.py

@@ -23,19 +23,21 @@
 class ExampleCode(torch.nn.Module):


can you add another test case inspired by torchtrain, where module has a ModuleList like this

Transformer
self.layers = torch.nn.ModuleList()
for layer_id in range(model_args.n_layers):
self.layers.append(TransformerBlock(layer_id, model_args))

What I am wondering is, how will you deal with keeping the FQNs the same after splitting?

if you put layers[0] on one stage and layers[1] on next stage,

will second stage still have layers[1] as FQN or will it drop to layers[0]?

Thanks. Good idea.
The second stage will have "layers.1" as the FQN.
"layers" is from the self.layers = torch.nn.ModuleList() tier.
"1" corresponds to the attribute within the ModuleList.
Those two are preserved from the original model.

@wconstab I added test_transformer.py.
You can see the updated PR description for the split structure.

H-Huang · 2024-03-11T14:58:47Z

pippy/IR.py

+
+
+# Add an alias for convenience
+aten_pipe_split_alias = torch.ops.pippy._pipe_split.default


curious, whats the purpose of the additional .default

@tugsbayasgalan wondering if you know the answer?

Due to export's unflatten bug: pytorch/pytorch#123122

kwen2501 · 2024-04-01T23:15:22Z

Status of this branch:

Good:

All unit tests passed.
GPT2 and LLaMA model worked.

Bad:

BERT and T5 are blocked by an unflattener issue: [torch.export] Unflattened BERT missing positional argument pytorch#123122.
Had to temporarily drop support for shared parameters (params used across stages), plan to re-add later.

I consider that the "Bad" items are not really blocking errors, and since the branch is needed by torchtrain pytorch/torchtitan#161, I am merging this branch into main as is.

facebook-github-bot added the cla signed label Feb 28, 2024

kwen2501 requested review from zhxchen17, wconstab, tugsbayasgalan, fegin, LucasLLC and avikchaudhuri February 28, 2024 03:58

wconstab reviewed Mar 1, 2024

View reviewed changes

kwen2501 added 12 commits March 4, 2024 10:13

Initial add of unflattener

931b2da

Unit tests passed

7656631

Lint

ea07600

Skip format check for unflatten.py

b314803

Move ATen registration from pytorch to pippy

a76d23a

Privatize functions

f204470

De-multi-use test_pipe_bwd.py

4103fe8

Re-enable auto-split

9d9f7c3

Add unflatten test

250c4a6

Add transformer test

455bc67

Add test to CI

4de7fc2

Add pipe.get_stage_module

83b80d3

kwen2501 force-pushed the unflatten branch from 124bbdd to 83b80d3 Compare March 4, 2024 19:10

update requirement to 2.3

27d9145

H-Huang approved these changes Mar 11, 2024

View reviewed changes

kwen2501 added 7 commits March 27, 2024 20:24

Merge branch 'main' into unflatten

831c468

Remove shared param from skip conn test

d7bbfb1

Merge branch 'main' into unflatten

1e580e4

Include pre-release version in CI

03403c5

Fix CI yaml

4783b70

Remove shape prop; use shape info from export

4fd949d

Another try to fix CI

870704b

kwen2501 added 8 commits March 28, 2024 12:03

Revert url change

6d79051

Remove shared param from test_optim

8ec6cc6

Add layout

01499ad

Extend op device modification for unflatten case

f62d53e

Use register_module instead of setattr

925a646

Merge branch 'main' into unflatten

eff4666

Disable BERT and T5 CI

c884859

Due to export's unflatten bug: pytorch/pytorch#123122

Update backward test url

567c365

kwen2501 merged commit 77be55d into main Apr 1, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unflatten traced module #954

Unflatten traced module #954

kwen2501 commented Feb 28, 2024 •

edited

Loading

wconstab Mar 1, 2024

kwen2501 Mar 4, 2024 •

edited

Loading

kwen2501 Mar 4, 2024

H-Huang Mar 11, 2024

kwen2501 Mar 12, 2024

kwen2501 commented Apr 1, 2024



		# Add an alias for convenience
		aten_pipe_split_alias = torch.ops.pippy._pipe_split.default

Unflatten traced module #954

Unflatten traced module #954

Conversation

kwen2501 commented Feb 28, 2024 • edited Loading

Description

Test

wconstab Mar 1, 2024

Choose a reason for hiding this comment

kwen2501 Mar 4, 2024 • edited Loading

Choose a reason for hiding this comment

kwen2501 Mar 4, 2024

Choose a reason for hiding this comment

H-Huang Mar 11, 2024

Choose a reason for hiding this comment

kwen2501 Mar 12, 2024

Choose a reason for hiding this comment

kwen2501 commented Apr 1, 2024

kwen2501 commented Feb 28, 2024 •

edited

Loading

kwen2501 Mar 4, 2024 •

edited

Loading