BubbleQ commited on
Commit
72a07a6
·
verified ·
1 Parent(s): 6a804e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +226 -226
README.md CHANGED
@@ -1,226 +1,226 @@
1
- # Klear
2
-
3
- <div align="center">
4
- <img src="figures/klear-logo-02.png" width="500"/>
5
- <p>
6
- 🤗 <a href="https://huggingface.co/Kwai-Klear">Hugging Face</a> | 📑 <a href="">Technique Report</a>
7
- <br>
8
- 🖥️ <a href="https://kml-dtmachine-15498-prod-1.kmlhb2az1l3-2.corp.kuaishou.com">Chat with Klear</a> | 💬 <a href="https://github.com/Kwai-Klear">Issues & Discussions</a>
9
- </p>
10
- </div>
11
-
12
-
13
- ## 🔥News
14
-
15
- - 2025.09.05: We released `Klear-46B-A2.5B` series. Currently, Klear-46B-A2.5B offers two versions: `a base model` and an advanced version that includes `instruction tuned` model. Additionally, `an reasoning version is currently in training`. Please stay tuned for more updates.
16
-
17
-
18
- ## 1. Introduction
19
-
20
-
21
- `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 activated** per forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
22
-
23
- The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
24
-
25
- **1. Foundational Knowledge Learning (12T tokens):**
26
- General-purpose datasets such as CommonCrawl were processed with stratified quality filters, following a curriculum learning strategy that progresses from lower to higher data quality.
27
-
28
- **2. Data Complexity Enhancement (8T tokens):**
29
- The proportion of mathematical, coding, and STEM-related data was gradually increased to strengthen the model's reasoning and problem-solving capabilities.
30
-
31
- **3. Reasoning Enhancement and Longcontext Stage (2T tokens):**
32
- Training focused on synthetic and reasoning-intensive data, combined with a fast learning rate annealing strategy to maximize data efficiency and optimize final performance.
33
-
34
- As a result, Klear-46B-A2.5B-Base matches or surpasses the performance of dense models with several times more active parameters, while offering significantly better efficiency and cost-effectiveness for real-world deployment.
35
-
36
-
37
- ## Model Summary
38
-
39
- this repo contains the base and instruction-tuned model**. which has the following architecture:
40
-
41
- | **propoty** | **value** |
42
- |---------------------------|------------------------------------------------------------------------|
43
- | hidden_size | 2048 |
44
- | moe_intermediate_size | 896 |
45
- | n_shared_experts | 1 |
46
- | num_attention_heads | 32 |
47
- | num_experts | 256 |
48
- | num_experts_per_tok | 8 |
49
- | num_hidden_layers | 32 |
50
- | num_key_value_heads | 4 |
51
- | vocab_size | 151936 |
52
- | tie_word_embeddings | false |
53
- | context length | 65536 |
54
-
55
-
56
- ### Model Downloads
57
-
58
- <div align="center">
59
-
60
- | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
61
- | :------------: | :------------: | :------------: | :------------: | :------------: |
62
- | Klear-46B-A2.5B-Base | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear) |
63
- | Klear-46B-A2.5B-Inst. | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear) |
64
-
65
- </div>
66
-
67
-
68
- ## 2. Benchmark Evaluation
69
- ### Klear-46B-A2.5B-Base Evaluation Results
70
- | Ability | Benchmark | Klear-46B-A2.5B-Base | MiMO-7B-Base | Qwen3-8B-BASE | Qwen3-14B-BASE | Ling-lite-1.5-Base | Qwen3-30B-A3B-BASE |
71
- | ----------- | ---------------------- | -------------------- | ------------ | ------------- | -------------- | ------------------ | ------------------ |
72
- | | # Total Params | 46B | 7B | 8B | 14B | 16.8B | 30B |
73
- | | # Activated Params | 2.5B | 7B | 8B | 14B | 2.75B | 3B |
74
- | **Code** | HumanEval (0-shot*) | 89 | - | 84.1 | 87.8 | 83.5 | 90.9 |
75
- | | MBPP (3-shot) | 76 | 55.2 | 69 | 74 | 66.6 | 75.6 |
76
- | **Math** | MATH (4-shot, cot) | 55.7 | 36.78 | 58.4 | 57.1 | 56.98 | 57.6 |
77
- | | CMATH (3-shot) | 87.8 | 78.5 | 88.3 | 90.7 | 85.7 | 89.7 |
78
- | | GSM8K (4-shot, cot) | 87.3 | 78.47 | 89.4 | 90.3 | 87.6 | 91.1 |
79
- | **General** | MMLU-Pro (5-shot, cot) | 57.6 | 43.1 | 55.2 | 58.1 | 49.9 | 58.8 |
80
- | | MMLU (5-shot) | 80.5 | 69.24 | 77.1 | 80.6 | 73.7 | 80.4 |
81
- | | CEval (5-shot) | 89.8 | 67.98 | 81.9 | 84.8 | 78.2 | 87.4 |
82
- | | CMMLU (5-shot) | 88 | 70.79 | 82 | 85.6 | 81.2 | 87.1 |
83
- | | GPQA (0-shot) | 35.3 | 31.03 | 33.9 | 35.7 | 30.1 | 35.5 |
84
- | | AGIEval (0-shot) | 52.3 | 48.3* | 51.7 | 55.7 | 54.3 | 56 |
85
- | | BBH (3-shot, cot) | 77.9 | 75.6 | 78.1 | 80.1 | 75.4 | 81.2 |
86
- | **Others** | HellaSwag (0-shot) | 80.5 | 80* | 78.7 | 81.5 | 80 | 81.2 |
87
- | | Winogrande (3-shot) | 78.8 | 78* | 73.6 | 78.5 | 72.1 | 77.9 |
88
- | | Triviaqa (5-shot) | 69.6 | 60.8* | 56.3 | 62.1 | 60.9 | 65.6 |
89
- | | Naturalqs (5-shot) | 37.5 | 23.46 | 25.7 | 29.1 | 28 | 30.7 |
90
- | | PIQA (0-shot) | 81.6 | 80.14 | 79.5 | 81.9 | 82 | 80.7 |
91
- | | SIQA (0-shot) | 67.9 | 51.74 | 56.2 | 58.4 | 56.3 | 56.3 |
92
- | | OpenBookQA (0-shot) | 37.8 | 34.2 | 35 | 35.6 | 38.2 | 34.6 |
93
-
94
- Note:
95
- 1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
96
- 2. For Mimo-base-7B, the results marked with `*` are sourced from other public reports.
97
-
98
- ### Klear-46B-A2.5B-Inst. Evaluation Results
99
- | Ability | Benchmark | Klear-46B-A2.5B-Inst. | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B |
100
- | ------------------------- | --------------------------- | --------------- | ----------- | ------------------ | ------------- | -------- |
101
- | | # Total Params | 46B | 8B | 8B | 12B | 14B |
102
- | | # Activated Params | 2.5B | 8B | 8B | 12B | 14B |
103
- | **English Understanding** | MMLU-Redux | 82.23 | 77.63 | 79.32 | 78.39 | 83.09 |
104
- | | MMLU-Pro | 64.82 | 54.69 | 63.8 | 60.69 | 67.25 |
105
- | | GPQA-Diamoind | 49.49 | 38.51 | 51.77 | 39.02 | 59.47 |
106
- | | SimpleQA | 5.94 | 3.51 | 5.5 | 6.22 | 3.28 |
107
- | **Chinese Understanding** | CLUEWSC | 88.82 | 81.91 | 82.89 | 91.12 | 88.16 |
108
- | | CEval | 84.29 | 81.78 | 81.66 | 60.81 | 64.79 |
109
- | | C-SimpleQA | 42.03 | 23.13 | 37.07 | 28.97 | 24.77 |
110
- | **Math & Reasoning** | MATH500 | 86.4 | 79.8 | 85 | 86.8 | 80.6 |
111
- | | AIME24 | 30.42 | 22.92 | 28.33 | 23.96 | 15.83 |
112
- | | AIME25 | 21.04 | 15.21 | 20.62 | 18.33 | 18.75 |
113
- | | ZebraLogic | 46.4 | 8.5 | 25.7 | 18 | 30.3 |
114
- | **Code** | HumanEval | 89.63 | 74.39 | 83.54 | 82.32 | 85.37 |
115
- | | HumanEval+ | 87.2 | 70.12 | 76.83 | 75.61 | 83.54 |
116
- | | MBPPEvalplus | 79.6 | 82 | 76.2 | 85.7 | 77.5 |
117
- | | MBPPEvalplus++ | 68.5 | 69.3 | 66.1 | 74.1 | 66.7 |
118
- | | LiveCodeBench v5(2408-2501) | 29.75 | 12.19 | 27.24 | 24.73 | 23.66 |
119
- | **Instruction Following** | IF-Eval | 80.41 | 73.01 | 84.47 | 81.52 | 59.33 |
120
- | | Multi-IF(en+zh) | 78.25 | 61.79 | 78.95 | 76.56 | 62.7 |
121
- | **Comprehensive Ability** | MTBench | 8.03 | 6.875 | 8.21 | 8.675 | 8.625 |
122
- | | MT-Eval | 8.1 | 6.7 | 8.18 | 8.45 | 8.12 |
123
- | | Arena-Hard v2 | 19.8 | 2.2 | 19.8 | 50 | 9.6 |
124
- | | AlignBench v1.1 | 6.8 | 5.99 | 6.95 | 6.3 | 6.33 |
125
- | | LiveBench 1125 | 48.7 | 25.5 | 52.1 | 43.1 | 40 |
126
-
127
- ## 3. Quick start
128
-
129
- ### Inference with huggingface
130
-
131
- You can now inference in Transformers starting from version `4.56.0`.
132
-
133
- #### Klear-46B-A2.5B-Base
134
-
135
- ```python
136
- import torch
137
- from transformers import AutoTokenizer, AutoModelForCausalLM
138
-
139
- model_name = "/path/to/Klear-Base"
140
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
141
-
142
- model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
143
-
144
- text = "世界上最大的湖是"
145
- inputs = tokenizer(text, return_tensors="pt")
146
- outputs = model.generate(**inputs.to(model.device), max_new_tokens=256)
147
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
148
- print(result)
149
- ```
150
-
151
- #### Klear-46B-A2.5B-Inst.
152
-
153
- ```python
154
- import torch
155
- from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
156
-
157
- model_name = "/path/to/Klear-Inst."
158
- tokenizer = AutoTokenizer.from_pretrained(model_name)
159
-
160
- model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
161
-
162
- messages = [
163
- {"role": "user", "content": "帮我用 python 写一个计算器的代码吧。"}
164
- ]
165
- input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
166
- outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1024)
167
-
168
- result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
169
- print(result)
170
- ```
171
-
172
- ### Inference with vllm
173
-
174
- [vLLM](https://github.com/vllm-project/vllm) is a high-speed and memery-efficicent inference framework. We provide our own forked version of [vLLM](https://github.com/vllm-project/vllm) here.
175
-
176
- ```shell
177
- git clone
178
- cd vllm
179
- pip install
180
- vllm serve Klear-46B-A2.5B-inst --port 8000 --tensor-parallel-size 8 --trust-remote-code
181
- ```
182
-
183
- An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
184
-
185
- Or you can refer to the following Python script for offline inference
186
- ```python
187
- from vllm import LLM, SamplingParams
188
-
189
- model_path = "/path/to/Klear"
190
- llm = LLM(
191
- model=model_path,
192
- trust_remote_code=True,
193
- num_speculative_tokens=1,
194
- disable_log_stats=False
195
- )
196
- sampling_params = SamplingParams(temperature=0.2)
197
-
198
- conversation = [
199
- {
200
- "role": "system",
201
- "content": ""
202
- },
203
- {
204
- "role": "user",
205
- "content": "Please help me write a snake game code.",
206
- },
207
- ]
208
-
209
- outputs = llm.chat(conversation,
210
- sampling_params=sampling_params,
211
- use_tqdm=False)
212
-
213
- for idx, output in enumerate(outputs):
214
- prompt = output.prompt
215
- generated_text = output.outputs[0].text
216
- print(f"==== Response #{idx} ====")
217
- print(f"Prompt: {prompt}, Generated text: {generated_text}")
218
-
219
- ```
220
-
221
- ## Citation
222
-
223
- If you find `Klear-46B-A2.5B` is useful or want to use in your projects, please kindly cite our paper:
224
-
225
- ```
226
- ```
 
1
+ # Klear
2
+
3
+ <div align="center">
4
+ <img src="figures/klear-logo-02.png" width="500"/>
5
+ <p>
6
+ 🤗 <a href="https://huggingface.co/Kwai-Klear">Hugging Face</a> | 📑 <a href="">Technique Report</a>
7
+ <br>
8
+ 🖥️ <a href="https://kml-dtmachine-15498-prod-1.kmlhb2az1l3-2.corp.kuaishou.com">Chat with Klear</a> | 💬 <a href="https://github.com/Kwai-Klear">Issues & Discussions</a>
9
+ </p>
10
+ </div>
11
+
12
+
13
+ ## 🔥News
14
+
15
+ - 2025.09.05: We released `Klear-46B-A2.5B` series. Currently, Klear-46B-A2.5B offers two versions: `a base model` and an advanced version that includes `instruction tuned` model. Additionally, `an reasoning version is currently in training`. Please stay tuned for more updates.
16
+
17
+
18
+ ## 1. Introduction
19
+
20
+
21
+ `Klear-46B-A2.5B` is a sparse Mixture-of-Experts (MoE) large language model developed by **the Kwai-Klear Team at Kuaishou**, designed to deliver both **high performance** and **inference efficiency**. It features **256 experts**, with only **8 activated** per forward pass, resulting in **46 billion total parameters** but just **2.5 billion active** — achieving dense-level performance at a fraction of the computational cost.
22
+
23
+ The model was trained on over **22 trillion tokens** using a **three-stage progressive curriculum**:
24
+
25
+ **1. Foundational Knowledge Learning (12T tokens):**
26
+ General-purpose datasets such as CommonCrawl were processed with stratified quality filters, following a curriculum learning strategy that progresses from lower to higher data quality.
27
+
28
+ **2. Data Complexity Enhancement (8T tokens):**
29
+ The proportion of mathematical, coding, and STEM-related data was gradually increased to strengthen the model's reasoning and problem-solving capabilities.
30
+
31
+ **3. Reasoning Enhancement and Longcontext Stage (2T tokens):**
32
+ Training focused on synthetic and reasoning-intensive data, combined with a fast learning rate annealing strategy to maximize data efficiency and optimize final performance.
33
+
34
+ As a result, Klear-46B-A2.5B-Base matches or surpasses the performance of dense models with several times more active parameters, while offering significantly better efficiency and cost-effectiveness for real-world deployment.
35
+
36
+
37
+ ## Model Summary
38
+
39
+ this repo contains the base and instruction-tuned model**. which has the following architecture:
40
+
41
+ | **propoty** | **value** |
42
+ |---------------------------|------------------------------------------------------------------------|
43
+ | hidden_size | 2048 |
44
+ | moe_intermediate_size | 896 |
45
+ | n_shared_experts | 1 |
46
+ | num_attention_heads | 32 |
47
+ | num_experts | 256 |
48
+ | num_experts_per_tok | 8 |
49
+ | num_hidden_layers | 32 |
50
+ | num_key_value_heads | 4 |
51
+ | vocab_size | 151936 |
52
+ | tie_word_embeddings | false |
53
+ | context length | 65536 |
54
+
55
+
56
+ ### Model Downloads
57
+
58
+ <div align="center">
59
+
60
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
61
+ | :------------: | :------------: | :------------: | :------------: | :------------: |
62
+ | Klear-46B-A2.5B-Base | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear) |
63
+ | Klear-46B-A2.5B-Inst. | 46B | 2.5B | 64K | [🤗 Hugging Face](https://huggingface.co/Kwai-Klear) |
64
+
65
+ </div>
66
+
67
+
68
+ ## 2. Benchmark Evaluation
69
+ ### Klear-46B-A2.5B-Base Evaluation Results
70
+ | Ability | Benchmark | Klear-46B-A2.5B-Base | MiMO-7B-Base | Qwen3-8B-BASE | Qwen3-14B-BASE | Ling-lite-1.5-Base | Qwen3-30B-A3B-BASE |
71
+ | ----------- | ---------------------- | -------------------- | ------------ | ------------- | -------------- | ------------------ | ------------------ |
72
+ | | # Total Params | 46B | 7B | 8B | 14B | 16.8B | 30B |
73
+ | | # Activated Params | 2.5B | 7B | 8B | 14B | 2.75B | 3B |
74
+ | **Code** | HumanEval (0-shot*) | 89 | - | 84.1 | 87.8 | 83.5 | 90.9 |
75
+ | | MBPP (3-shot) | 76 | 55.2 | 69 | 74 | 66.6 | 75.6 |
76
+ | **Math** | MATH (4-shot, cot) | 55.7 | 36.78 | 58.4 | 57.1 | 56.98 | 57.6 |
77
+ | | CMATH (3-shot) | 87.8 | 78.5 | 88.3 | 90.7 | 85.7 | 89.7 |
78
+ | | GSM8K (4-shot, cot) | 87.3 | 78.47 | 89.4 | 90.3 | 87.6 | 91.1 |
79
+ | **General** | MMLU-Pro (5-shot, cot) | 57.6 | 43.1 | 55.2 | 58.1 | 49.9 | 58.8 |
80
+ | | MMLU (5-shot) | 80.5 | 69.24 | 77.1 | 80.6 | 73.7 | 80.4 |
81
+ | | CEval (5-shot) | 89.8 | 67.98 | 81.9 | 84.8 | 78.2 | 87.4 |
82
+ | | CMMLU (5-shot) | 88 | 70.79 | 82 | 85.6 | 81.2 | 87.1 |
83
+ | | GPQA (0-shot) | 35.3 | 31.03 | 33.9 | 35.7 | 30.1 | 35.5 |
84
+ | | AGIEval (0-shot) | 52.3 | 48.3* | 51.7 | 55.7 | 54.3 | 56 |
85
+ | | BBH (3-shot, cot) | 77.9 | 75.6 | 78.1 | 80.1 | 75.4 | 81.2 |
86
+ | **Others** | HellaSwag (0-shot) | 80.5 | 80* | 78.7 | 81.5 | 80 | 81.2 |
87
+ | | Winogrande (3-shot) | 78.8 | 78* | 73.6 | 78.5 | 72.1 | 77.9 |
88
+ | | Triviaqa (5-shot) | 69.6 | 60.8* | 56.3 | 62.1 | 60.9 | 65.6 |
89
+ | | Naturalqs (5-shot) | 37.5 | 23.46 | 25.7 | 29.1 | 28 | 30.7 |
90
+ | | PIQA (0-shot) | 81.6 | 80.14 | 79.5 | 81.9 | 82 | 80.7 |
91
+ | | SIQA (0-shot) | 67.9 | 51.74 | 56.2 | 58.4 | 56.3 | 56.3 |
92
+ | | OpenBookQA (0-shot) | 37.8 | 34.2 | 35 | 35.6 | 38.2 | 34.6 |
93
+
94
+ Note:
95
+ 1. `*`During pretraining, we found that the HumanEval metric fluctuated significantly and was extremely sensitive to formatting. Therefore, we referred to the prompt from Ling-series paper to modify the original HumanEval. The results in the table are the evaluation metrics after this modification.
96
+ 2. For Mimo-base-7B, the results marked with `*` are sourced from other public reports.
97
+
98
+ ### Klear-46B-A2.5B-Inst. Evaluation Results
99
+ | Ability | Benchmark | Klear-46B-A2.5B-Inst. | MiniCPM4-8B | Qwen3-8B (NoThink) | gemma3-12b-it | Phi4-14B |
100
+ | ------------------------- | --------------------------- | --------------- | ----------- | ------------------ | ------------- | -------- |
101
+ | | # Total Params | 46B | 8B | 8B | 12B | 14B |
102
+ | | # Activated Params | 2.5B | 8B | 8B | 12B | 14B |
103
+ | **English Understanding** | MMLU-Redux | 82.23 | 77.63 | 79.32 | 78.39 | 83.09 |
104
+ | | MMLU-Pro | 64.82 | 54.69 | 63.8 | 60.69 | 67.25 |
105
+ | | GPQA-Diamoind | 49.49 | 38.51 | 51.77 | 39.02 | 59.47 |
106
+ | | SimpleQA | 5.94 | 3.51 | 5.5 | 6.22 | 3.28 |
107
+ | **Chinese Understanding** | CLUEWSC | 88.82 | 81.91 | 82.89 | 91.12 | 88.16 |
108
+ | | CEval | 84.29 | 81.78 | 81.66 | 60.81 | 64.79 |
109
+ | | C-SimpleQA | 42.03 | 23.13 | 37.07 | 28.97 | 24.77 |
110
+ | **Math & Reasoning** | MATH500 | 86.4 | 79.8 | 85 | 86.8 | 80.6 |
111
+ | | AIME24 | 30.42 | 22.92 | 28.33 | 23.96 | 15.83 |
112
+ | | AIME25 | 21.04 | 15.21 | 20.62 | 18.33 | 18.75 |
113
+ | | ZebraLogic | 46.4 | 8.5 | 25.7 | 18 | 30.3 |
114
+ | **Code** | HumanEval | 89.63 | 74.39 | 83.54 | 82.32 | 85.37 |
115
+ | | HumanEval+ | 87.2 | 70.12 | 76.83 | 75.61 | 83.54 |
116
+ | | MBPPEvalplus | 79.6 | 82 | 76.2 | 85.7 | 77.5 |
117
+ | | MBPPEvalplus++ | 68.5 | 69.3 | 66.1 | 74.1 | 66.7 |
118
+ | | LiveCodeBench v5(2408-2501) | 29.75 | 12.19 | 27.24 | 24.73 | 23.66 |
119
+ | **Instruction Following** | IF-Eval | 80.41 | 73.01 | 84.47 | 81.52 | 59.33 |
120
+ | | Multi-IF(en+zh) | 78.25 | 61.79 | 78.95 | 76.56 | 62.7 |
121
+ | **Comprehensive Ability** | MTBench | 8.03 | 6.875 | 8.21 | 8.675 | 8.625 |
122
+ | | MT-Eval | 8.1 | 6.7 | 8.18 | 8.45 | 8.12 |
123
+ | | Arena-Hard v2 | 19.8 | 2.2 | 19.8 | 50 | 9.6 |
124
+ | | AlignBench v1.1 | 6.8 | 5.99 | 6.95 | 6.3 | 6.33 |
125
+ | | LiveBench 1125 | 48.7 | 25.5 | 52.1 | 43.1 | 40 |
126
+
127
+ ## 3. Quick start
128
+
129
+ ### Inference with huggingface
130
+
131
+ You can now inference in Transformers starting from version `4.56.0`.
132
+
133
+ #### Klear-46B-A2.5B-Base
134
+
135
+ ```python
136
+ import torch
137
+ from transformers import AutoTokenizer, AutoModelForCausalLM
138
+
139
+ model_name = "/path/to/Klear-Base"
140
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
141
+
142
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True)
143
+
144
+ text = "世界上最大的湖是"
145
+ inputs = tokenizer(text, return_tensors="pt")
146
+ outputs = model.generate(**inputs.to(model.device), max_new_tokens=256)
147
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
148
+ print(result)
149
+ ```
150
+
151
+ #### Klear-46B-A2.5B-Inst.
152
+
153
+ ```python
154
+ import torch
155
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
156
+
157
+ model_name = "/path/to/Klear-Inst."
158
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
159
+
160
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
161
+
162
+ messages = [
163
+ {"role": "user", "content": "帮我用 python 写一个计算器的代码吧。"}
164
+ ]
165
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
166
+ outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1024)
167
+
168
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
169
+ print(result)
170
+ ```
171
+
172
+ ### Inference with vllm
173
+
174
+ [vLLM](https://github.com/vllm-project/vllm) is a high-speed and memery-efficicent inference framework. We provide our own forked version of [vLLM](https://github.com/vllm-project/vllm) here.
175
+
176
+ ```shell
177
+ git clone
178
+ cd vllm
179
+ pip install
180
+ vllm serve Klear-46B-A2.5B-inst --port 8000 --tensor-parallel-size 8 --trust-remote-code
181
+ ```
182
+
183
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
184
+
185
+ Or you can refer to the following Python script for offline inference
186
+ ```python
187
+ from vllm import LLM, SamplingParams
188
+
189
+ model_path = "/path/to/Klear"
190
+ llm = LLM(
191
+ model=model_path,
192
+ trust_remote_code=True,
193
+ num_speculative_tokens=1,
194
+ disable_log_stats=False
195
+ )
196
+ sampling_params = SamplingParams(temperature=0.2)
197
+
198
+ conversation = [
199
+ {
200
+ "role": "system",
201
+ "content": ""
202
+ },
203
+ {
204
+ "role": "user",
205
+ "content": "Please help me write a snake game code.",
206
+ },
207
+ ]
208
+
209
+ outputs = llm.chat(conversation,
210
+ sampling_params=sampling_params,
211
+ use_tqdm=False)
212
+
213
+ for idx, output in enumerate(outputs):
214
+ prompt = output.prompt
215
+ generated_text = output.outputs[0].text
216
+ print(f"==== Response #{idx} ====")
217
+ print(f"Prompt: {prompt}, Generated text: {generated_text}")
218
+
219
+ ```
220
+
221
+ ## Citation
222
+
223
+ If you find `Klear-46B-A2.5B` is useful or want to use in your projects, please kindly cite our paper:
224
+
225
+ ```
226
+ ```