Skip to content

Commit

Permalink
Add ExpertParallel Mixture-of-Experts Plugin (#99)
Browse files Browse the repository at this point in the history
* initial commit

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* include prepare_scattermoe

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* fixes and add scenarios-moe. Allow gradient_accum=null mode

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* missed out on CONTENTS.yaml

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update readme, code cleanup, add comments and initial bench

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* more cleanup and update pf bench

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add more comments and minor refactoring

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* finish up comments

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add padding free to granite moe

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* fmt and lint.

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* install workflow + more fmt + fix test

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* go back to dtensors for sharded checkpoints

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add scattermoe checkpoint restorer utility

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* fmt + lint

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* more cleanup

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* improved documention on state dict inferernce

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add more test on inferring checkpoint metadat

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update configs for mixtral

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update granite configs

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* fix readme and update GraniteMoE to FOAK

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* commit benches

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
  • Loading branch information
fabianlim authored Nov 13, 2024
1 parent d767e33 commit 5b35eae
Show file tree
Hide file tree
Showing 52 changed files with 4,658 additions and 12 deletions.
1 change: 1 addition & 0 deletions .github/workflows/build-and-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ jobs:
- "accelerated-peft"
- "fused-ops-and-kernels"
- "attention-and-distributed-packing"
- "accelerated-moe"

permissions:
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:
- "accelerated-peft"
- "fused-ops-and-kernels"
- "attention-and-distributed-packing"
- "accelerated-moe"

steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Plugin | Description | Depends | License | Status
[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Alpha
[fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md) | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta
[attention-and-distributed-packing](./plugins/attention-and-distributed-packing/README.md) | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta
MOE-training-acceleration | [MegaBlocks](https://github.com/databricks/megablocks) inspired triton Kernels and acclerations for Mixture-of-Expert models | | Apache 2.0 | Coming Soon
[accelerated-moe](./plugins/accelerated-moe/README.md) | Triton Kernels for Mixture-of-Expert parallel, inspired by [ScatterMoe](https://github.com/shawntan/scattermoe) and [MegaBlocks](https://github.com/databricks/megablocks) | | Apache 2.0 | Beta

## Usage with FMS HF Tuning

Expand Down
10 changes: 10 additions & 0 deletions plugins/accelerated-moe/.isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[settings]
profile=black
from_first=true
import_heading_future=Future
import_heading_stdlib=Standard
import_heading_thirdparty=Third Party
import_heading_firstparty=First Party
import_heading_localfolder=Local
known_firstparty=
known_localfolder=tuning
Loading

0 comments on commit 5b35eae

Please sign in to comment.