Update README.md
Browse files
README.md
CHANGED
|
@@ -14,12 +14,14 @@ We introduce **GroveMoE**, a new sparse architecture using **adjugate experts**
|
|
| 14 |
- **Sparse Activation**: 33 B params total, only **3.14–3.28 B** active per token.
|
| 15 |
- **Traning**: Mid-training + SFT, up-cycled from Qwen3-30B-A3B-Base; preserves prior knowledge while adding new capabilities.
|
| 16 |
|
| 17 |
-
## Model
|
| 18 |
-
| GroveMoE Series | Download
|
| 19 |
-
|---|---
|
| 20 |
-
GroveMoE-Base | 🤗 [HuggingFace](https://huggingface.co/inclusionAI/GroveMoE-Base)
|
| 21 |
-
GroveMoE-Inst | 🤗 [HuggingFace](https://huggingface.co/inclusionAI/GroveMoE-Inst)
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Performance
|
| 25 |
|
|
@@ -32,7 +34,7 @@ GroveMoE-Inst | 🤗 [HuggingFace](https://huggingface.co/inclusionAI/GroveMoE-
|
|
| 32 |
|Mistral-Small-3.2| 24B | 68.1 | 37.5 | 59.9 | 61.9 | 33.4 | 28.1 | 69.5 | 32.2 |
|
| 33 |
|GroveMoE-Inst|3.14~3.28B | <font color=#FBD98D>**72.8**</font> | <font color=#FBD98D>**47.7**</font> | <font color=#FBD98D>**61.3**</font> |<font color=#FBD98D>**71.2**</font> |<font color=#FBD98D>**43.5**</font> | <font color=#FBD98D>**44.4**</font> |<font color=#FBD98D>**74.5**</font> | <font color=#FBD98D>**34.6**</font> |
|
| 34 |
|
| 35 |
-
We bold the top1 scores separately for all models.
|
| 36 |
|
| 37 |
## Usage
|
| 38 |
Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library.
|
|
|
|
| 14 |
- **Sparse Activation**: 33 B params total, only **3.14–3.28 B** active per token.
|
| 15 |
- **Traning**: Mid-training + SFT, up-cycled from Qwen3-30B-A3B-Base; preserves prior knowledge while adding new capabilities.
|
| 16 |
|
| 17 |
+
## Model Downloads
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
<div align="center">
|
| 20 |
+
| **Model** | **#Total Params** | **#Activated Params** | **Download** |
|
| 21 |
+
| :----------------: | :---------------: | :-------------------: | :----------: |
|
| 22 |
+
| GroveMoE-Base | 33B | 3.14~3.28B | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Base) |
|
| 23 |
+
| GroveMoE-Inst | 3B | 3.14~3.28B | [🤗 HuggingFace](https://huggingface.co/inclusionAI/GroveMoE-Inst) |
|
| 24 |
+
</div>
|
| 25 |
|
| 26 |
## Performance
|
| 27 |
|
|
|
|
| 34 |
|Mistral-Small-3.2| 24B | 68.1 | 37.5 | 59.9 | 61.9 | 33.4 | 28.1 | 69.5 | 32.2 |
|
| 35 |
|GroveMoE-Inst|3.14~3.28B | <font color=#FBD98D>**72.8**</font> | <font color=#FBD98D>**47.7**</font> | <font color=#FBD98D>**61.3**</font> |<font color=#FBD98D>**71.2**</font> |<font color=#FBD98D>**43.5**</font> | <font color=#FBD98D>**44.4**</font> |<font color=#FBD98D>**74.5**</font> | <font color=#FBD98D>**34.6**</font> |
|
| 36 |
|
| 37 |
+
We bold the top1 scores separately for all models. More details will be reported in our [technical report](https://arxiv.org/abs/2508.07785).
|
| 38 |
|
| 39 |
## Usage
|
| 40 |
Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library.
|