Skip to content

Commit

Permalink
Update on "[MoE][PoC] Expert Parallel: tp and tp2ep"
Browse files Browse the repository at this point in the history
Issues (12/11/2024)
- forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better
- torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks

Haven't worked on
- softmax scoring when Router Parallel is used (currently only sigmoid)

[ghstack-poisoned]
  • Loading branch information
tianyu-l committed Dec 12, 2024
1 parent 50faa5a commit fa01d7c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion train_configs/debug_model.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ max_norm = 1.0 # grad norm clipping
steps = 10
data_parallel_replicate_degree = 1
data_parallel_shard_degree = -1
tensor_parallel_degree = 4
tensor_parallel_degree = 1
compile = false
dataset = "c4_test" # supported datasets: c4_test (2K), c4 (177M)

Expand Down

0 comments on commit fa01d7c

Please sign in to comment.