BubbleQ commited on
Commit
9a3b52d
·
verified ·
1 Parent(s): 1e0b60f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -24,8 +24,7 @@ library_name: transformers
24
  ## 1. Introduction
25
 
26
 
27
- `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 activated** per forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
28
-
29
  The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
30
 
31
  **1. Foundational Knowledge Learning (12T tokens):**
 
24
  ## 1. Introduction
25
 
26
 
27
+ `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
 
28
  The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
29
 
30
  **1. Foundational Knowledge Learning (12T tokens):**