Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update on "[MoE][PoC] Expert Parallel: tp and tp2ep"
Issues (12/11/2024) - forward collectives look right ("tp2ep" AG -> compute -> RS), need to understand the backward better - torch.compile generates full graph (applied per TransformerBlock), but inserts an additional A2A at the end of every two blocks Haven't worked on - softmax scoring when Router Parallel is used (currently only sigmoid) [ghstack-poisoned]
- Loading branch information