Update README.md
Browse files
README.md
CHANGED
|
@@ -25,6 +25,7 @@ library_name: transformers
|
|
| 25 |
|
| 26 |
|
| 27 |
`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
|
|
|
|
| 28 |
The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
|
| 29 |
|
| 30 |
**1. Foundational Knowledge Learning (12T tokens):**
|
|
@@ -65,7 +66,7 @@ The base and instruction tuned + DPO models have the following architecture:
|
|
| 65 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
|
| 66 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
| 67 |
| Klear-46B-A2.5B-Base | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear/Klear-46B-A2.5B-Base) |
|
| 68 |
-
| Klear-46B-A2.5B-
|
| 69 |
|
| 70 |
</div>
|
| 71 |
|
|
@@ -99,8 +100,8 @@ Note:
|
|
| 99 |
1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
|
| 100 |
2. For Mimo-base-7B, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
|
| 101 |
|
| 102 |
-
### Klear-46B-A2.5B-
|
| 103 |
-
| Ability | Benchmark | Klear-46B-A2.5B | InternLM3-8B-Instruct | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B | Qwen3-30B-A3B-2507 |
|
| 104 |
| ------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
|
| 105 |
| | # Total Params | 46B | 8B | 8B | 8B | 12B | 14B | 30B |
|
| 106 |
| | # Activated Params | 2.5B | 8B | 8B | 8B | 12B | 14B | 3B |
|
|
@@ -155,13 +156,13 @@ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
| 155 |
print(result)
|
| 156 |
```
|
| 157 |
|
| 158 |
-
#### Klear-46B-A2.5B-
|
| 159 |
|
| 160 |
```python
|
| 161 |
import torch
|
| 162 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
|
| 163 |
|
| 164 |
-
model_path = "/path/to/Klear-
|
| 165 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 166 |
|
| 167 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
|
|
@@ -184,7 +185,7 @@ print(result)
|
|
| 184 |
git clone https://github.com/Kwai-Klear/vllm.git
|
| 185 |
cd vllm
|
| 186 |
VLLM_USE_PRECOMPILED=1 pip install --editable .
|
| 187 |
-
vllm serve /path/to/Klear-
|
| 188 |
```
|
| 189 |
|
| 190 |
An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
|
|
@@ -194,7 +195,7 @@ Or you can refer to the following Python script for offline inference
|
|
| 194 |
from vllm import LLM, SamplingParams
|
| 195 |
from transformers import AutoTokenizer
|
| 196 |
|
| 197 |
-
model_path = "/path/to/Klear-
|
| 198 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 199 |
|
| 200 |
llm = LLM(
|
|
|
|
| 25 |
|
| 26 |
|
| 27 |
`Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
|
| 28 |
+
|
| 29 |
The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
|
| 30 |
|
| 31 |
**1. Foundational Knowledge Learning (12T tokens):**
|
|
|
|
| 66 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
|
| 67 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
| 68 |
| Klear-46B-A2.5B-Base | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear/Klear-46B-A2.5B-Base) |
|
| 69 |
+
| Klear-46B-A2.5B-Instruct | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear) |
|
| 70 |
|
| 71 |
</div>
|
| 72 |
|
|
|
|
| 100 |
1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
|
| 101 |
2. For Mimo-base-7B, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
|
| 102 |
|
| 103 |
+
### Klear-46B-A2.5B-Instruct Evaluation Results
|
| 104 |
+
| Ability | Benchmark | Klear-46B-A2.5B-Instruct | InternLM3-8B-Instruct | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B | Qwen3-30B-A3B-2507 |
|
| 105 |
| ------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
|
| 106 |
| | # Total Params | 46B | 8B | 8B | 8B | 12B | 14B | 30B |
|
| 107 |
| | # Activated Params | 2.5B | 8B | 8B | 8B | 12B | 14B | 3B |
|
|
|
|
| 156 |
print(result)
|
| 157 |
```
|
| 158 |
|
| 159 |
+
#### Klear-46B-A2.5B-Instruct
|
| 160 |
|
| 161 |
```python
|
| 162 |
import torch
|
| 163 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
|
| 164 |
|
| 165 |
+
model_path = "/path/to/Klear-Instruct"
|
| 166 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 167 |
|
| 168 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
|
|
|
|
| 185 |
git clone https://github.com/Kwai-Klear/vllm.git
|
| 186 |
cd vllm
|
| 187 |
VLLM_USE_PRECOMPILED=1 pip install --editable .
|
| 188 |
+
vllm serve /path/to/Klear-Instruct --port 8000 --tensor-parallel-size 8 --trust-remote-code
|
| 189 |
```
|
| 190 |
|
| 191 |
An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
|
|
|
|
| 195 |
from vllm import LLM, SamplingParams
|
| 196 |
from transformers import AutoTokenizer
|
| 197 |
|
| 198 |
+
model_path = "/path/to/Klear-Instruct"
|
| 199 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 200 |
|
| 201 |
llm = LLM(
|