Add ExpertParallel Mixture-of-Experts Plugin (#99)

* initial commit Signed-off-by: Yu Chin Fabian Lim <[email protected]> * include prepare_scattermoe Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fixes and add scenarios-moe. Allow gradient_accum=null mode Signed-off-by: Yu Chin Fabian Lim <[email protected]> * missed out on CONTENTS.yaml Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update readme, code cleanup, add comments and initial bench Signed-off-by: Yu Chin Fabian Lim <[email protected]> * more cleanup and update pf bench Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add more comments and minor refactoring Signed-off-by: Yu Chin Fabian Lim <[email protected]> * finish up comments Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add padding free to granite moe Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fmt and lint. Signed-off-by: Yu Chin Fabian Lim <[email protected]> * install workflow + more fmt + fix test Signed-off-by: Yu Chin Fabian Lim <[email protected]> * go back to dtensors for sharded checkpoints Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add scattermoe checkpoint restorer utility Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fmt + lint Signed-off-by: Yu Chin Fabian Lim <[email protected]> * more cleanup Signed-off-by: Yu Chin Fabian Lim <[email protected]> * improved documention on state dict inferernce Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add more test on inferring checkpoint metadat Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update configs for mixtral Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update granite configs Signed-off-by: Yu Chin Fabian Lim <[email protected]> * fix readme and update GraniteMoE to FOAK Signed-off-by: Yu Chin Fabian Lim <[email protected]> * commit benches Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]>
foundation-model-stack · Nov 13, 2024 · 5b35eae · 5b35eae
1 parent d767e33
commit 5b35eae
Show file tree

Hide file tree

Showing 52 changed files with 4,658 additions and 12 deletions.
diff --git a/.github/workflows/build-and-publish.yml b/.github/workflows/build-and-publish.yml
@@ -15,6 +15,7 @@ jobs:
           - "accelerated-peft"
           - "fused-ops-and-kernels"
           - "attention-and-distributed-packing"
+          - "accelerated-moe"
 
     permissions:
       id-token: write  # IMPORTANT: this permission is mandatory for trusted publishing

diff --git a/.github/workflows/format.yml b/.github/workflows/format.yml
@@ -30,6 +30,7 @@ jobs:
           - "accelerated-peft"
           - "fused-ops-and-kernels"
           - "attention-and-distributed-packing"
+          - "accelerated-moe"
 
     steps:
       - uses: actions/checkout@v4

diff --git a/README.md b/README.md
@@ -34,7 +34,7 @@ Plugin | Description | Depends | License | Status
 [accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Alpha
 [fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md)  | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta
 [attention-and-distributed-packing](./plugins/attention-and-distributed-packing/README.md)  | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta
- MOE-training-acceleration  | [MegaBlocks](https://github.com/databricks/megablocks) inspired triton Kernels and acclerations for Mixture-of-Expert models |  | Apache 2.0 | Coming Soon
+[accelerated-moe](./plugins/accelerated-moe/README.md)   | Triton Kernels for Mixture-of-Expert parallel, inspired by [ScatterMoe](https://github.com/shawntan/scattermoe) and [MegaBlocks](https://github.com/databricks/megablocks) |  | Apache 2.0 | Beta
 
 ## Usage with FMS HF Tuning
 

diff --git a/plugins/accelerated-moe/.isort.cfg b/plugins/accelerated-moe/.isort.cfg
@@ -0,0 +1,10 @@
+[settings]
+profile=black
+from_first=true
+import_heading_future=Future
+import_heading_stdlib=Standard
+import_heading_thirdparty=Third Party
+import_heading_firstparty=First Party
+import_heading_localfolder=Local
+known_firstparty=
+known_localfolder=tuning