Update README.md
Browse files
README.md
CHANGED
|
@@ -24,8 +24,7 @@ library_name: transformers
|
|
| 24 |
## 1. Introduction
|
| 25 |
|
| 26 |
|
| 27 |
-
`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 activated** per forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
|
| 28 |
-
|
| 29 |
The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
|
| 30 |
|
| 31 |
**1. Foundational Knowledge Learning (12T tokens):**
|
|
|
|
| 24 |
## 1. Introduction
|
| 25 |
|
| 26 |
|
| 27 |
+
`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
|
|
|
|
| 28 |
The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
|
| 29 |
|
| 30 |
**1. Foundational Knowledge Learning (12T tokens):**
|