nielsr HF Staff commited on
Commit
1fd7ebb
·
verified ·
1 Parent(s): 426032a

Improve model card: Update `library_name`, add relevant tags, and clarify links

Browse files

This PR enhances the model card by:

- Setting the `library_name` to `transformers` in the metadata, which enables the automated "how to use" widget on the Hub. This is confirmed by the `auto_map` entries in `config.json`.
- Removing the redundant `transformers` tag and adding more descriptive tags such as `moe`, `mixture-of-experts`, and `code-generation` to improve model discoverability, based on the model's architecture and capabilities.
- Adding explicit links to the paper, GitHub repository, and project page at the top of the model card for easier access.
- Updating the internal paper link within the "Model Introduction" section to point to the official Hugging Face paper page (`https://huggingface.co/papers/2509.01322`).
- Adding a note in the "Quick Start" section to inform users about the necessity of `trust_remote_code=True` when loading the model with `transformers`.

Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
 
2
  license: mit
3
- library_name: LongCat-Flash-Chat
4
  pipeline_tag: text-generation
5
  tags:
6
- - transformers
 
 
7
  ---
8
 
9
  # LongCat-Flash-Chat
10
 
 
 
11
  <div align="center">
12
  <img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg"
13
  width="300"
@@ -62,7 +66,7 @@ Effectively and efficiently scaling model size remains a key challenge in strate
62
  #### 🌟 Multi-Stage Training Pipeline for Agentic Capability
63
  Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction.
64
 
65
- For more detail, please refer to the comprehensive [***LongCat-Flash Technical Report***](https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/tech_report.pdf).
66
 
67
  ## Evaluation Results
68
  | **Benchmark** | **DeepSeek V3.1** | **Qwen3 MoE-2507** | **Kimi-K2** | **GPT-4.1** | **Claude4 Sonnet** | **Gemini2.5 Flash** | **LongCat-Flash** |
@@ -113,6 +117,7 @@ Note:
113
  * DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode.
114
 
115
  ## Quick Start
 
116
 
117
  ### Chat Template
118
  The details of our chat template are provided in the `tokenizer_config.json` file. Below are some examples.
@@ -220,5 +225,4 @@ We kindly encourage citation of our work if you find it useful.
220
 
221
 
222
  ## Contact
223
- Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or open an issue if you have any questions.
224
-
 
1
  ---
2
+ library_name: transformers
3
  license: mit
 
4
  pipeline_tag: text-generation
5
  tags:
6
+ - moe
7
+ - mixture-of-experts
8
+ - code-generation
9
  ---
10
 
11
  # LongCat-Flash-Chat
12
 
13
+ [Paper](https://huggingface.co/papers/2509.01322) - [Code](https://github.com/meituan-longcat/LongCat-Flash-Chat) - [Project Page](https://longcat.ai)
14
+
15
  <div align="center">
16
  <img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg"
17
  width="300"
 
66
  #### 🌟 Multi-Stage Training Pipeline for Agentic Capability
67
  Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction.
68
 
69
+ For more detail, please refer to the comprehensive [***LongCat-Flash Technical Report***](https://huggingface.co/papers/2509.01322).
70
 
71
  ## Evaluation Results
72
  | **Benchmark** | **DeepSeek V3.1** | **Qwen3 MoE-2507** | **Kimi-K2** | **GPT-4.1** | **Claude4 Sonnet** | **Gemini2.5 Flash** | **LongCat-Flash** |
 
117
  * DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode.
118
 
119
  ## Quick Start
120
+ To use this model with the Hugging Face `transformers` library, you need to ensure `trust_remote_code=True` is set due to custom architectural components.
121
 
122
  ### Chat Template
123
  The details of our chat template are provided in the `tokenizer_config.json` file. Below are some examples.
 
225
 
226
 
227
  ## Contact
228
+ Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or open an issue if you have any questions.