shunxing1234 commited on
Commit
75c1193
·
verified ·
1 Parent(s): 3a5b714

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +204 -3
README.md CHANGED
@@ -1,3 +1,204 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ---
5
+ language:
6
+ - multilingual
7
+ license: other
8
+ license_name: kwaipilot-license
9
+ license_link: LICENSE
10
+ library_name: transformers
11
+ ---
12
+ <div align="center">
13
+ <img src="https://raw.githubusercontent.com/Anditty/OASIS/refs/heads/main/Group.svg" width="60%" alt="Kwaipilot" />
14
+ </div>
15
+
16
+ <hr>
17
+
18
+ <div align="center" style="line-height: 1;">
19
+ <a href="https://huggingface.co/Kwaipilot/KAT-V1-40B" target="_blank">
20
+ <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
21
+ </a>
22
+
23
+ <a href="https://arxiv.org/pdf/2507.08297" target="_blank">
24
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.08297-b31b1b.svg?style=for-the-badge"/>
25
+ </a>
26
+ </div>
27
+
28
+ # News
29
+
30
+ - We released the technical report of the **KAT-V1 model**, available at https://arxiv.org/pdf/2507.08297.
31
+ - Kwaipilot-AutoThink ranks first among all open-source models on [LiveCodeBench Pro](https://livecodebenchpro.com/), a challenging benchmark explicitly designed to prevent data leakage, and even surpasses strong proprietary systems such as Seed and o3-mini.
32
+
33
+ ***
34
+
35
+ # Introduction
36
+
37
+ **KAT (Kwaipilot-AutoThink)** is an open-source large-language model that mitigates *over-thinking* by learning **when** to produce explicit chain-of-thought and **when** to answer directly.
38
+
39
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/zdnsvBmv6hWIC2Qxxy1fD.png)
40
+
41
+ Its development follows a concise two-stage training pipeline:
42
+
43
+ <table>
44
+ <thead>
45
+ <tr>
46
+ <th style="text-align:left; width:18%;">Stage</th>
47
+ <th style="text-align:left;">Core Idea</th>
48
+ <th style="text-align:left;">Key Techniques</th>
49
+ <th style="text-align:left;">Outcome</th>
50
+ </tr>
51
+ </thead>
52
+ <tbody>
53
+ <tr>
54
+ <td><strong>1. Pre-training</strong></td>
55
+ <td>Inject knowledge while separating “reasoning” from “direct answering”.</td>
56
+ <td>
57
+ <em>Dual-regime data</em><br>
58
+ • <strong>Think-off</strong> queries labeled via a custom tagging system.<br>
59
+ • <strong>Think-on</strong> queries generated by a multi-agent solver.<br><br>
60
+ <em>Knowledge Distillation&nbsp;+&nbsp;Multi-Token Prediction</em> for fine-grained utility.
61
+ </td>
62
+ <td>Base model attains strong factual and reasoning skills without full-scale pre-training costs.</td>
63
+ </tr>
64
+ <tr>
65
+ <td><strong>2. Post-training</strong></td>
66
+ <td>Make reasoning optional and efficient.</td>
67
+ <td>
68
+ <em>Cold-start AutoThink</em> — majority vote sets the initial thinking mode.<br>
69
+ <em>Step-SRPO</em> — intermediate supervision rewards correct <strong>mode selection</strong> and <strong>answer accuracy</strong> under that mode.
70
+ </td>
71
+ <td>Model triggers CoT only when beneficial, reducing token use and speeding inference.</td>
72
+ </tr>
73
+ </tbody>
74
+ </table>
75
+
76
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/cwFAEh7Rl3f4FU46z8gBZ.png)
77
+
78
+
79
+ ***
80
+
81
+ # Data Format
82
+
83
+
84
+ KAT produces responses in a **structured template** that makes the reasoning path explicit and machine-parsable.
85
+ Two modes are supported:
86
+
87
+
88
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/H8iAvQMMT02nyvlYnI5q1.jpeg)
89
+
90
+
91
+ ## Special Tokens
92
+
93
+ | Token | Description |
94
+ |-------|-------------|
95
+ | `<judge>` | Analyzes the input to decide whether explicit reasoning is needed. |
96
+ | `<think_on>` / `<think_off>` | Indicates whether reasoning is **activated** (“on”) or **skipped** (“off”). |
97
+ | `<think>` | Marks the start of the chain-of-thought segment when `think_on` is chosen. |
98
+ | `<answer>` | Marks the start of the final user-facing answer. |
99
+
100
+
101
+ ***
102
+
103
+ # 🔧 Quick Start
104
+
105
+ ```python
106
+ from transformers import AutoTokenizer, AutoModelForCausalLM
107
+
108
+ model_name = "Kwaipilot/KAT-V1-40B"
109
+
110
+ # load the tokenizer and the model
111
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
112
+ model = AutoModelForCausalLM.from_pretrained(
113
+ model_name,
114
+ torch_dtype="auto",
115
+ device_map="auto"
116
+ )
117
+
118
+ # prepare the model input
119
+ prompt = "Give me a short introduction to large language model."
120
+ messages = [
121
+ {"role": "user", "content": prompt}
122
+ ]
123
+ text = tokenizer.apply_chat_template(
124
+ messages,
125
+ tokenize=False,
126
+ add_generation_prompt=True
127
+ )
128
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
129
+
130
+ # conduct text completion
131
+ generated_ids = model.generate(
132
+ **model_inputs,
133
+ max_new_tokens=65536,
134
+ temperature=0.6,
135
+ top_p=0.95,
136
+ )
137
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
138
+ content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
139
+ print("prompt:\n", prompt)
140
+ print("content:\n", content)
141
+ """
142
+ prompt:
143
+ Give me a short introduction to large language model.
144
+ content:
145
+ <judge>
146
+ The user's request is to provide a concise factual introduction to large language models, which involves retrieving and summarizing basic information. This task is straightforward as it only requires recalling and presenting well-known details without deeper analysis. No complex reasoning is needed here—just a simple explanation will suffice.
147
+ </judge>
148
+
149
+ <think_off>
150
+ <answer>
151
+ A **Large Language Model (LLM)** is an advanced AI system trained on vast amounts of text data to understand, generate, and process human-like language. Here’s a concise introduction:
152
+
153
+ ### Key Points:
154
+ 1. **Training**: Trained on diverse text sources (books, websites, etc.) using deep learning.
155
+ 2. **Capabilities**:
156
+ - Answer questions, generate text, summarize content, translate languages.
157
+ - Understand context, sentiment, and nuances in language.
158
+ 3. **Architecture**: Often based on **transformer models** (e.g., BERT, GPT, LLaMA).
159
+ 4. **Scale**: Billions of parameters, requiring massive computational resources.
160
+ 5. **Applications**: Chatbots, content creation, coding assistance, research, and more.
161
+
162
+ ### Examples:
163
+ - **OpenAI’s GPT-4**: Powers ChatGPT.
164
+ - **Google’s Gemini**: Used in Bard.
165
+ - **Meta’s LLaMA**: Open-source alternative.
166
+
167
+ ### Challenges:
168
+ - **Bias**: Can reflect biases in training data.
169
+ - **Accuracy**: May hallucinate "facts" not grounded in reality.
170
+ - **Ethics**: Raises concerns about misinformation and job displacement.
171
+
172
+ LLMs represent a leap forward in natural language processing, enabling machines to interact with humans in increasingly sophisticated ways. 🌐🤖
173
+ </answer>
174
+ """
175
+ ```
176
+
177
+ ***
178
+
179
+ # Future Releases
180
+
181
+ Looking ahead, we will publish a companion paper that fully documents the **AutoThink training framework**, covering:
182
+
183
+ * Cold-start initialization procedures
184
+ * Reinforcement-learning (Step-SRPO) strategies
185
+ * Data curation and reward design details
186
+
187
+ At the same time, we will open-source:
188
+
189
+ * **Training resources** – the curated dual-regime datasets and RL codebase
190
+ * **Model suite** – checkpoints at 1.5B, 7B, and 13B parameters, all trained with AutoThink gating
191
+
192
+
193
+ # Citation
194
+
195
+ ```
196
+ @techreport{Zhan2025KATV1,
197
+ title={KAT-V1: Kwai-AutoThink Technical Report},
198
+ author={Zizheng, Zhan and Ken, Deng and Huaixi, Tang and Wen, Xiang and Kun, Wu and others},
199
+ year={2025},
200
+ institution={arXiv preprint arXiv:2507.08297},
201
+ number={arXiv:2507.08297},
202
+ url={https://arxiv.org/abs/2507.08297}
203
+ }
204
+ ```