Kwai-Klear
/

Klear-46B-A2.5B-Base

Text Generation

Model card Files Files and versions

BubbleQ commited on Sep 5, 2025

Commit

9a3b52d

·

verified ·

1 Parent(s): 1e0b60f

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -24,8 +24,7 @@ library_name: transformers
 ## 1. Introduction
-`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 activated** per forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
 The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
 **1. Foundational Knowledge Learning (12T tokens):**

 ## 1. Introduction
+`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
 The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
 **1. Foundational Knowledge Learning (12T tokens):**