Improve model card: Update `library_name`, add relevant tags, and clarify links
Browse filesThis PR enhances the model card by:
- Setting the `library_name` to `transformers` in the metadata, which enables the automated "how to use" widget on the Hub. This is confirmed by the `auto_map` entries in `config.json`.
- Removing the redundant `transformers` tag and adding more descriptive tags such as `moe`, `mixture-of-experts`, and `code-generation` to improve model discoverability, based on the model's architecture and capabilities.
- Adding explicit links to the paper, GitHub repository, and project page at the top of the model card for easier access.
- Updating the internal paper link within the "Model Introduction" section to point to the official Hugging Face paper page (`https://huggingface.co/papers/2509.01322`).
- Adding a note in the "Quick Start" section to inform users about the necessity of `trust_remote_code=True` when loading the model with `transformers`.
|
@@ -1,13 +1,17 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: mit
|
| 3 |
-
library_name: LongCat-Flash-Chat
|
| 4 |
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
-
-
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# LongCat-Flash-Chat
|
| 10 |
|
|
|
|
|
|
|
| 11 |
<div align="center">
|
| 12 |
<img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg"
|
| 13 |
width="300"
|
|
@@ -62,7 +66,7 @@ Effectively and efficiently scaling model size remains a key challenge in strate
|
|
| 62 |
#### 🌟 Multi-Stage Training Pipeline for Agentic Capability
|
| 63 |
Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction.
|
| 64 |
|
| 65 |
-
For more detail, please refer to the comprehensive [***LongCat-Flash Technical Report***](https://
|
| 66 |
|
| 67 |
## Evaluation Results
|
| 68 |
| **Benchmark** | **DeepSeek V3.1** | **Qwen3 MoE-2507** | **Kimi-K2** | **GPT-4.1** | **Claude4 Sonnet** | **Gemini2.5 Flash** | **LongCat-Flash** |
|
|
@@ -113,6 +117,7 @@ Note:
|
|
| 113 |
* DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode.
|
| 114 |
|
| 115 |
## Quick Start
|
|
|
|
| 116 |
|
| 117 |
### Chat Template
|
| 118 |
The details of our chat template are provided in the `tokenizer_config.json` file. Below are some examples.
|
|
@@ -220,5 +225,4 @@ We kindly encourage citation of our work if you find it useful.
|
|
| 220 |
|
| 221 |
|
| 222 |
## Contact
|
| 223 |
-
Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or open an issue if you have any questions.
|
| 224 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
license: mit
|
|
|
|
| 4 |
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
+
- moe
|
| 7 |
+
- mixture-of-experts
|
| 8 |
+
- code-generation
|
| 9 |
---
|
| 10 |
|
| 11 |
# LongCat-Flash-Chat
|
| 12 |
|
| 13 |
+
[Paper](https://huggingface.co/papers/2509.01322) - [Code](https://github.com/meituan-longcat/LongCat-Flash-Chat) - [Project Page](https://longcat.ai)
|
| 14 |
+
|
| 15 |
<div align="center">
|
| 16 |
<img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg"
|
| 17 |
width="300"
|
|
|
|
| 66 |
#### 🌟 Multi-Stage Training Pipeline for Agentic Capability
|
| 67 |
Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction.
|
| 68 |
|
| 69 |
+
For more detail, please refer to the comprehensive [***LongCat-Flash Technical Report***](https://huggingface.co/papers/2509.01322).
|
| 70 |
|
| 71 |
## Evaluation Results
|
| 72 |
| **Benchmark** | **DeepSeek V3.1** | **Qwen3 MoE-2507** | **Kimi-K2** | **GPT-4.1** | **Claude4 Sonnet** | **Gemini2.5 Flash** | **LongCat-Flash** |
|
|
|
|
| 117 |
* DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode.
|
| 118 |
|
| 119 |
## Quick Start
|
| 120 |
+
To use this model with the Hugging Face `transformers` library, you need to ensure `trust_remote_code=True` is set due to custom architectural components.
|
| 121 |
|
| 122 |
### Chat Template
|
| 123 |
The details of our chat template are provided in the `tokenizer_config.json` file. Below are some examples.
|
|
|
|
| 225 |
|
| 226 |
|
| 227 |
## Contact
|
| 228 |
+
Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or open an issue if you have any questions.
|
|
|