Enable SequenceParallel in te.LayerNormMLP layer #1

vchiley · 2023-07-07T00:08:40Z

Given mosaicml#432 we can relatively easily enable SequenceParallel training.

This branch is just for testing for now; if it works well, we can consider merging it.

Note: I'm pretty sure model ckpt using tp, then load without tp will be a headache and will require tooling be built.

vchiley · 2023-07-08T00:58:39Z

On a single node of 8 GPUs

MPT 125M training with / without Sequence Parallel is nearly identical with TP size in [2, 4, 8]

BUT, since the 125M model is small, its always slower to use TP vs the standard model even if mbs=1

If we bump up the model size to 7B, at mbs=1, a TP world size of 2 and 4 are slightly faster than the baseline

At mbs=2, the advantage does away.

If we bump up the model size to 13B, at mbs=1, again a TP world size of 2 and 4 are slightly faster than the baseline

At mbs=2, a TP world size of 2 is almost as fast, but again the advantage goes away

Note this is done on one node and does not factor in interconnect speed.
wandb here

vchiley · 2023-07-08T15:56:50Z

Issue: FSDP operates on the module level. The module has [layer_norm_weight, layer_norm_bias, fc1_weight, fc1_bias, fc2_weight, fc2_bias] parameters, where FSDP needs to tree [layer_norm_weight, layer_norm_bias, fc2_bias] the standard sharded way, and [fc1_weight, fc1_bias, fc2_weight] need to be treated standard TP sharded way...

adding sp to te ln_mlp

2230058

vchiley self-assigned this Jul 7, 2023

updt

5ea969a

vchiley marked this pull request as draft July 7, 2023 00:29

vchiley added 4 commits July 7, 2023 14:11

Merge branch 'cfgte' into cfgte_sp

4da68c4

Merge branch 'cfgte' into cfgte_sp

1ad59d8

Merge branch 'cfgte' into cfgte_sp

dec94c9

Merge branch 'cfgte' into cfgte_sp

a3186c0

fix n_active_params with tp

f9beaea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SequenceParallel in te.LayerNormMLP layer #1

Enable SequenceParallel in te.LayerNormMLP layer #1

vchiley commented Jul 7, 2023 •

edited

Loading

vchiley commented Jul 8, 2023 •

edited

Loading

vchiley commented Jul 8, 2023 •

edited

Loading

Enable SequenceParallel in te.LayerNormMLP layer #1

Are you sure you want to change the base?

Enable SequenceParallel in te.LayerNormMLP layer #1

Conversation

vchiley commented Jul 7, 2023 • edited Loading

vchiley commented Jul 8, 2023 • edited Loading

vchiley commented Jul 8, 2023 • edited Loading

vchiley commented Jul 7, 2023 •

edited

Loading

vchiley commented Jul 8, 2023 •

edited

Loading

vchiley commented Jul 8, 2023 •

edited

Loading