Kwai-Klear
/

Klear-46B-A2.5B-Base

@@ -25,6 +25,7 @@ library_name: transformers
 `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
 The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
 **1. Foundational Knowledge Learning (12T tokens):**
@@ -65,7 +66,7 @@ The base and instruction tuned + DPO models have the following architecture:
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
 | :------------: | :------------: | :------------: | :------------: | :------------: |
 | Klear-46B-A2.5B-Base | 46B | 2.5B | 64K   | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear/Klear-46B-A2.5B-Base)   |
-| Klear-46B-A2.5B-Inst.  | 46B | 2.5B |  64K   | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear)   |
 </div>
@@ -99,8 +100,8 @@ Note:
 1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
 2. For Mimo-base-7B, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
-### Klear-46B-A2.5B-Inst. Evaluation Results
-| Ability       | Benchmark                   | Klear-46B-A2.5B | InternLM3-8B-Instruct | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B | Qwen3-30B-A3B-2507 |
 | ------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
 |               | # Total Params              | 46B             | 8B                    | 8B          | 8B                 | 12B           | 14B      | 30B                |
 |               | # Activated Params          | 2.5B            | 8B                    | 8B          | 8B                 | 12B           | 14B      | 3B                 |
@@ -155,13 +156,13 @@ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(result)
 ```
-#### Klear-46B-A2.5B-Inst.
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
-model_path = "/path/to/Klear-Inst."
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
@@ -184,7 +185,7 @@ print(result)
 git clone https://github.com/Kwai-Klear/vllm.git
 cd vllm
 VLLM_USE_PRECOMPILED=1 pip install --editable .
-vllm serve /path/to/Klear-Inst. --port 8000 --tensor-parallel-size 8 --trust-remote-code
 ```
 An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
@@ -194,7 +195,7 @@ Or you can refer to the following Python script for offline inference
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
-model_path = "/path/to/Klear-Inst."
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 llm = LLM(

 `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 experts and 1 shared expert activated** per layer during the forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
 The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
 **1. Foundational Knowledge Learning (12T tokens):**
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
 | :------------: | :------------: | :------------: | :------------: | :------------: |
 | Klear-46B-A2.5B-Base | 46B | 2.5B | 64K   | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear/Klear-46B-A2.5B-Base)   |
+| Klear-46B-A2.5B-Instruct  | 46B | 2.5B |  64K   | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear)   |
 </div>
 1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
 2. For Mimo-base-7B, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
+### Klear-46B-A2.5B-Instruct Evaluation Results
+| Ability       | Benchmark                   | Klear-46B-A2.5B-Instruct | InternLM3-8B-Instruct | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B | Qwen3-30B-A3B-2507 |
 | ------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
 |               | # Total Params              | 46B             | 8B                    | 8B          | 8B                 | 12B           | 14B      | 30B                |
 |               | # Activated Params          | 2.5B            | 8B                    | 8B          | 8B                 | 12B           | 14B      | 3B                 |
 print(result)
 ```
+#### Klear-46B-A2.5B-Instruct
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+model_path = "/path/to/Klear-Instruct"
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
 git clone https://github.com/Kwai-Klear/vllm.git
 cd vllm
 VLLM_USE_PRECOMPILED=1 pip install --editable .
+vllm serve /path/to/Klear-Instruct --port 8000 --tensor-parallel-size 8 --trust-remote-code
 ```
 An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
+model_path = "/path/to/Klear-Instruct"
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 llm = LLM(