From b1d0fda21fa8a9bd9a848fa2d6452b42e46cf6d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Moritz=20Th=C3=BCning?= Date: Sun, 31 Dec 2023 11:04:52 +0100 Subject: [PATCH] fix typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2cb8df6..8960560 100644 --- a/README.md +++ b/README.md @@ -124,7 +124,7 @@ For this we can choose as chunk size the window size. For each chunk, we thus ne # Sparse Mixture of Experts (SMoE) -Sparse Mixture of Experts allows one to decouple throughput from memory costs by only activating subsets of the overall model for each token. In this approach, each token is assigned to one or more "experts" -- a separate set of weights -- and only processed by sunch experts. This division happens at feedforward layers of the model. The expert models specialize in different aspects of the data, allowing them to capture complex patterns and make more accurate predictions. +Sparse Mixture of Experts allows one to decouple throughput from memory costs by only activating subsets of the overall model for each token. In this approach, each token is assigned to one or more "experts" -- a separate set of weights -- and only processed by such experts. This division happens at feedforward layers of the model. The expert models specialize in different aspects of the data, allowing them to capture complex patterns and make more accurate predictions. ![SMoE](assets/smoe.png)