LongCat0830 commited on
Commit
b08a5a6
·
verified ·
1 Parent(s): 1e34851

Upload 2 files

Browse files
Files changed (2) hide show
  1. LICENSE +21 -0
  2. README.md +240 -3
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Meituan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,3 +1,240 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LongCat-Flash-Thinking
2
+
3
+ <div align="center">
4
+ <img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg" width="45%" alt="LongCat-Flash" />
5
+ </div>
6
+ <hr>
7
+
8
+
9
+ <div align="center" style="line-height: 1;">
10
+ <a href="https://longcat.ai/" target="_blank" style="margin: 2px;">
11
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-LongCat--Flash--Thinking-ADFF2F?color=29E154&logoColor=white" fill-opacity="1" style="display: inline-block; vertical-align: middle;"/>
12
+ </a>
13
+ <a href="https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking">
14
+ <img alt="github" src="https://img.shields.io/badge/🤖%20Github-LongCat--Flash--Thinking-ff6b6b?color=1783ff&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
15
+ </a>
16
+ </div>
17
+
18
+ <div align="center" style="line-height: 1;">
19
+ <a href="https://github.com/meituan-longcat/LongCat-Flash-Thinking/blob/main/figures/wechat_official_accounts.png" target="_blank" style="margin: 2px;">
20
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
21
+ </a>
22
+ <a href="https://x.com/Meituan_LongCat" target="_blank" style="margin: 2px;">
23
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
24
+ </a>
25
+ </div>
26
+
27
+ <div align="center" style="line-height: 1;">
28
+ <a href="LICENSE" style="margin: 2px;">
29
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
30
+ </a>
31
+ </div>
32
+
33
+ <p align="center">
34
+ <a href="https://github.com/meituan-longcat/LongCat-Flash-Thinking/blob/main/tech_report.pdf"><b>Tech Report</b>&nbsp;📄</a>
35
+ </p>
36
+
37
+
38
+
39
+ ## Model Introduction
40
+
41
+ We introduce and release **LongCat-Flash-Thinking**, which is a powerful and efficient large reasoning model (LRM) with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. LongCat-Flash-Thinking is developed by our DORA system, which is an efficient distributed RL framework that supports asynchronous training and flexible accelerator usage to ensure stability and efficiency. Our comprehensive data curation and domain-parallel training recipe ensures stable and efficient training. In addition to general reasoning, the model is also equipped with techniques of formal reasoning and agentic reasoning, advancing the LRMs' reasoning ability on diverse complex tasks such as mathematics, logic, programming, automatic theorem proving, and tool use.
42
+
43
+ Specifically, the development of LongCat-Flash-Thinking follows a two-phase pipeline:
44
+ - **Long CoT Cold-Start Training**: This phase aims to cultivate the model's foundational reasoning abilities.
45
+ This begins with a curriculum learning strategy during mid-training to bolster intrinsic capabilities, followed by a SFT stage on reasoning-intensive and agentic data to prepare the model for advanced learning.
46
+ - **Large-Scale RL**: The second phase scales up this potential through an efficient RL framework, built upon our Dynamic Orchestration for Asynchronous Rollout (DORA) system for industrial-scale asynchronous training.
47
+ To address the stability challenges in asynchronous RL training, we adapt and extend the GRPO algorithm for a robust exploration-exploitation balance. A key innovation in this phase is our domain-parallel training scheme, which simultaneously optimizes the model across distinct domains and subsequently merges the resulting domain-expert models into a fused model. Finally, we perform a general RL stage to further refine the fused model and enhance its robustness, safety, and human alignment ability.
48
+
49
+
50
+ ### Key Features
51
+
52
+ #### 🌟 Domain-Parallel RL Training Methodology
53
+
54
+ To overcome the instability of traditional mixed-domain RL training, LongCat-Flash-Thinking incorporates a domain-parallel training scheme that decouples optimization across STEM, coding, and agentic tasks.
55
+ This approach not only stabilizes training, but also allows to fuse the resulting domain-expert models into a nearly Pareto-optimal final model that excels across all specialties.
56
+
57
+ #### 🌟 Pioneering RL Infrastructure
58
+
59
+ LongCat-Flash-Thinking is built upon our self-designed DORA system.
60
+ The main motivation is to optimize long-tail generation by leveraging multiple old versions of the Actor model through streaming rollout while keeping sampling consistency.
61
+ DORA system consists of two core components, such as elastic colocation and multi-version asynchronous pipeline. These components aim to enhance training efficiency, ensure policy consistency per sample, and further enable efficient KV-cache reuse, facilitating stable and scalable training on tens of thousands of accelerators.
62
+
63
+ #### 🌟 Advancing Formal Reasoning and Agentic Reasoning
64
+
65
+ In addition to general reasoning (e.g., mathematics, logic, coding, instruction-following, etc.), LongCat-Flash-Thinking also emphasizes two other critical capabilities.
66
+ - **Formal Reasoning**: LongCat-Flash-Thinking can solve complex formal reasoning tasks, e.g., automatic theorem proving. To help realize this potential and empower researchers, we introduce significant enhancements to our model's formal reasoning capabilities.
67
+ To achieve this, we introduce a novel expert iteration framework for careful data synthesis, involving statement formalization, iterative proof synthesis, and syntax/consistency filtering.
68
+ - **Agentic Reasoning**: LongCat-Flash-Thinking can adaptively utilize provided tools to solve complex reasoning tasks. To reach this goal, we introduce a dual-path reasoning approach to identify and retain high-quality queries that genuinely require tool assistance, thereby fostering the development of robust agentic abilities.
69
+ After high-value query selection, we synthesize corresponding high-quality
70
+ solution trajectories based on a versatile environment with diverse tool APIs,
71
+ including MCP servers and simulated tools for both single and multi-turn interactions.
72
+
73
+
74
+ ## Evaluation Results
75
+
76
+ | **Benchmark** | DeepSeek V3.1 (Thinking) | Qwen3-235B-A22B (Thinking) | GLM-4.5 | OpenAI-o3 | Gemini2.5-Pro | GPT-5 (Thinking) | LongCat-Flash (Thinking) |
77
+ |---------------|-------------------------|----------------------------|--------|-----------|---------------|-----------------|--------------------------|
78
+ | Architecture | MoE | MoE | MoE | - | - | - | MoE |
79
+ | \# Total Params | 671B | 235B | 355B | - | - | - | 560B |
80
+ | \# Activated Params | 37B | 22B | 32B | - | - | - | 27B |
81
+ | **General QA** | | | | | | | |
82
+ | MMLU-Pro<sub>(acc)</sub> | 84.9 | 84.4 | 81.5 | 85.9 | **86.7** | 84.5 | 83.6 |
83
+ | MMLU-Redux<sub>(acc)</sub> | 90.5 | 91.4 | 89.9 | **93.1** | 90.1 | 92.6 | 92.8 |
84
+ | SimpleQA<sub>(acc)</sub> | 26.8 | 44.8 | 24.3 | 48.7 | **52.6** | 48.3 | 15.4 |
85
+ | **Alignment** | | | | | | | |
86
+ | IFEval<sub>(strict prompt)</sub> | 86.3 | 89.3 | 85.4 | 90.2 | 92.4 | **92.8** | 87.1 |
87
+ | Arena-Hard<sub>(creative writing)</sub> | 86.6 | 90.9 | 88.6 | 89.6 | **95.7** | 91.8 | 81.8 |
88
+ | **Mathematical Reasoning** | | | | | | | |
89
+ | MATH500<sub>(Mean@1)</sub> | 98.8 | 99.6 | 95.4 | 98.4 | 98.0 | 99.2 | **99.4** |
90
+ | HMMT25<sub>(Mean@32)</sub> | 80.4 | 83.8 | 76.3 | 71.9 | 79.3 | **84.8** | 82.9 |
91
+ | AIME24<sub>(Mean@32)</sub> | 93.9 | 93.9 | 89.3 | 91.6* | 90.7 | 92.0 | **92.9** |
92
+ | AIME25<sub>(Mean@32)</sub> | 87.9 | 92.5 | 85.5 | 88.9* | 89.2 | **94.6*** | 90.6 |
93
+ | BeyondAIME<sub>(Mean@10)</sub> | 71.8 | 71.5 | 66.0 | 63.2 | 63.0 | **70.0** | 69.7 |
94
+ | **General Reasoning** | | | | | | | |
95
+ | GPQA-Diamond<sub>(Mean@16)</sub> | 84.2 | 80.4 | 78.3 | 81.9 | 84.0 | **84.4** | 77.9 |
96
+ | ZebraLogic<sub>(Mean@1)</sub> | 96.1 | **97.5** | 85.8 | 94.3 | 92.4 | 92.7 | 95.2 |
97
+ | Sudoku-Bench<sub>(Mean@1)</sub> | 1.0 | 1.0 | 1.0 | **70.0** | 0.0 | 63.0 | 66.0 |
98
+ | ARC-AGI<sub>(Mean@1)</sub> | 37.5 | 45.3 | 17.5 | 47.3 | 46.8 | **59.0** | 51.0 |
99
+ | **Coding** | | | | | | | |
100
+ | LiveCodeBench<sub>(Mean@4)</sub> | 73.5 | 75.4 | 61.1 | 76.2 | 74.2 | **80.6** | 79.0 |
101
+ | OIBench-(Overall)<sub>(Mean@1)</sub> | 31.9 | 37.1 | 23.6 | 36.1 | 35.4 | **48.3** | 30.2 |
102
+ | OIBench-(Pseudo)<sub>(Mean@1)</sub> | 43.9 | 51.1 | 38.7 | 49.1 | 48.5 | **66.0** | 39.5 |
103
+ | OJBench<sub>(Mean@1)</sub> | 33.6 | 32.1 | 19.0 | 38.4 | **41.6** | 34.1 | 39.9 |
104
+ | Aider-Polyglot | 76.3 | 48.9 | - | 78.2 | 84.9 | **88.5** | 55.0 |
105
+ | SWE-Bench<sub>(Pass@1)</sub> | 66.0* | 34.4 | 64.2* | **69.1*** | 59.6* | 74.9* | 58.5 |
106
+ | **Agentic Tool Using** | | | | | | | |
107
+ | BFCL V3<sub>(full)</sub> | 55.4 | 64.4 | 79.1 | 72.4* | 63.2 | 60.1 | **75.8** |
108
+ | $\tau^2$-Bench (avg)<sub>(Mean@4)</sub> | 44.4 | 58.5 | 63.8 | 67.6 | 55.7 | **80.1*** | 73.1 |
109
+ | VitaBench | 13.5 | 21.5 | 26.8 | 35.2 | 25.2 | 29.2 | |
110
+ | **Formal Theorem Proving** | | | | | | | |
111
+ | MiniF2F-test<sub>(Pass@1)</sub> | 49.6 | 10.1 | 10.9 | 15.2 | 13.9 | 9.8 | **67.9** |
112
+ | MiniF2F-test<sub>(Pass@8)</sub> | 74.4 | 20.9 | 22.1 | 29.6 | 29.4 | 26.1 | **80.2** |
113
+ | MiniF2F-test<sub>(Pass@32)</sub> | 79.5 | 26.6 | 27.0 | 37.7 | 41.8 | 36.9 | **83.2** |
114
+ | **Safety** | | | | | | | |
115
+ | Harmful | 79.2 | 84.3 | 70.4 | 64.8 | 44.3 | 56.8 | **93.7** |
116
+ | Criminal | 89.7 | 92.7 | 88.8 | 85.7 | 77.4 | 87.3 | **97.1** |
117
+ | Misinformation | 81.1 | 80.9 | 67.1 | 42.7 | 31.0 | 41.9 | **93.0** |
118
+ | Privacy | 96.2 | **100.0** | 97.6 | **100.0** | 95.0 | 98.8 | 98.8 |
119
+
120
+ Note:
121
+ - Values marked with * are sourced from other public reports.
122
+ - The inference parameters of our LongCat-Flash-Thinking are set as `temperature=1.0`, `topk=-1`, and `topp=0.95`.
123
+
124
+
125
+ ## Quick Start
126
+
127
+ ### Chat Template
128
+ The details of our chat template are provided in the `tokenizer_config.json` file. Below are some examples.
129
+
130
+ #### First-Turn
131
+
132
+ With the following prefix, LongCat-Flash can generate responses corresponding to user queries:
133
+
134
+ ```
135
+ [Round 0] USER:{query} /think_on ASSISTANT:
136
+ ```
137
+
138
+ When a system prompt is specified, the prefix will take the following format:
139
+
140
+ ```
141
+ SYSTEM:{system_prompt} [Round 0] USER:{query} /think_on ASSISTANT:
142
+ ```
143
+
144
+ #### Multi-Turn
145
+
146
+ In multi-turn scenarios, the prefix is constructed by concatenating the context with the latest user query:
147
+ ```
148
+ SYSTEM:{system_prompt} [Round 0] USER:{query} /think_on ASSISTANT:{response}</longcat_s>... [Round N-1] USER:{query} /think_on ASSISTANT:{response}</longcat_s> [Round N] USER:{query} /think_on ASSISTANT:
149
+ ```
150
+
151
+ Here, N denotes the N-th round of user queries, with indexing starting from zero.
152
+
153
+ #### ToolCall
154
+
155
+ LongCat-Flash supports tool calling in the following format:
156
+ ```
157
+ {tool_description}
158
+
159
+ ## Messages
160
+ SYSTEM:{system_prompt} [Round 0] USER:{query} /think_on ASSISTANT:
161
+ ```
162
+
163
+
164
+
165
+ The tool_description is:
166
+ ```markdown
167
+ ## Tools
168
+ You have access to the following tools:
169
+
170
+ ### Tool namespace: function
171
+
172
+ #### Tool name: {func.name}
173
+
174
+ Description: {func.description}
175
+
176
+ InputSchema:
177
+ {json.dumps(func.parameters, indent=2)}
178
+
179
+ **Note**: For each function call, return a json object with function name and arguments within <longcat_tool_call></longcat_tool_call> XML tags as follows:
180
+ <longcat_tool_call>
181
+ {"name": <function-name>, "arguments": <args-dict>}
182
+ </longcat_tool_call>
183
+ When multiple functions need to be called simultaneously, each function call should be wrapped in its own <longcat_tool_call> tag and placed consecutively. For example:
184
+ <longcat_tool_call>
185
+ {"name": <function-name>, "arguments": <args-dict>}
186
+ </longcat_tool_call><longcat_tool_call>
187
+ {"name": <function-name>, "arguments": <args-dict>}
188
+ </longcat_tool_call>
189
+ ```
190
+
191
+ #### Mathematical Reasoning
192
+ We recommend adding the following instructions when solving mathematical or other STEM-related reasoning tasks, so that the output results can be located for evaluation.
193
+
194
+ ```text
195
+ [Round 0] USER:{problem}
196
+ Please reason step by step, and put your final answer within \\boxed{}. /think_on ASSISTANT:
197
+ ```
198
+
199
+
200
+
201
+ #### Formal Reasoning
202
+
203
+ LongCat-Flash-Thinking also support formal reasoning, like automatic theorem proving (ATP). The specific template is:
204
+
205
+ ```text
206
+ [Round 0] USER:Think about and solve the following problem step by step in Lean 4.
207
+ # Problem:{problem}
208
+
209
+ # Formal statement:{formal_statement}
210
+ /think_on ASSISTANT:
211
+ ```
212
+
213
+
214
+
215
+ ## Deployment
216
+ We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash-Thinking. Please refer to the [Deployment Guide](https://github.com/meituan-longcat/LongCat-Flash-Thinking/docs/deployment_guide.md) for detailed deployment instructions.
217
+
218
+ ## Chat Website
219
+ You can chat with LongCat-Flash-Thinking on our official website: [https://longcat.ai](https://longcat.ai).
220
+ Please turn on the button "Think" ("深度思考" in Chinese) before submitting your request.
221
+
222
+ ## License Agreement
223
+
224
+ The **model weights** are released under the **MIT License**.
225
+
226
+ Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.
227
+
228
+ See the [LICENSE](LICENSE) file for the full license text.
229
+
230
+ ## Usage Considerations
231
+ This model has not been specifically designed or comprehensively evaluated for every possible downstream application.
232
+
233
+ Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios.
234
+ It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements.
235
+
236
+ Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released.
237
+
238
+
239
+ ## Contact
240
+ Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or join our WeChat Group if you have any questions.