nielsr HF Staff commited on
Commit
b32c19e
·
verified ·
1 Parent(s): 5082fe3

Update model card: Add library_name, paper/code links, transformers usage, and deployment info

Browse files

This PR significantly enhances the model card for the Ring-1T model by:

* **Adding `library_name: transformers` to the metadata**: This enables the automated "how to use" widget on the Hugging Face Hub, providing users with automated code snippets for easy integration with the `transformers` library.
* **Aligning the main title of the model card** with the official paper title: "Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model".
* **Including a direct link to the Hugging Face paper page**: [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://huggingface.co/papers/2510.18855) in the introductory section.
* **Adding a prominent link to the GitHub repository**: [https://github.com/inclusionAI/Ring-V2](https://github.com/inclusionAI/Ring-V2) for quick access to the code.
* **Integrating a `transformers` code snippet** for quick model usage, as found in the original GitHub README, under the Quickstart section.
* **Updating the SGLang and vLLM deployment sections** with more comprehensive environment preparation and usage instructions from the GitHub repository.
* **Adding the BibTeX citation** for the paper.

These updates collectively improve the discoverability, usability, and completeness of the model card on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +121 -33
README.md CHANGED
@@ -1,8 +1,12 @@
1
  ---
2
  license: mit
3
  pipeline_tag: text-generation
 
4
  ---
5
 
 
 
 
6
 
7
  <p align="center">
8
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
@@ -10,8 +14,6 @@ pipeline_tag: text-generation
10
 
11
  <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
12
 
13
- # Ring-1T, flow state leads to sudden enlightenment
14
-
15
  Today, we officially launch the trillion-parameter thinking model, Ring-1T. It is open-source upon release—developers can download the model weights from Hugging Face and ModelScope, or experience direct chat interactions and API calls via the Ling Chat page and ZenMux (links provided at the end of the article).
16
 
17
  Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
@@ -35,7 +37,7 @@ Note: If you are interested in previous version, please visit the past model col
35
 
36
  ## Continuously Evolving Deep Reasoning Capabilities
37
 
38
- To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-Pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
39
 
40
  <p align="center">
41
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
@@ -96,6 +98,46 @@ For the RL training framework, we built a hybrid reward system based on large-sc
96
 
97
  You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ### 🔌 API Usage
100
 
101
  You can also use Ring-1T through API calls:
@@ -133,38 +175,39 @@ print(completion.choices[0].message.content)
133
 
134
  #### Environment Preparation
135
 
136
- We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
 
 
 
 
137
  ```shell
138
- pip3 install -U sglang sgl-kernel
 
 
 
 
 
139
  ```
140
 
141
  #### Run Inference
142
 
143
- Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
144
-
145
- Here is the example to run Ring-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
146
 
147
  - Start server:
148
- ```bash
149
- # Node 0:
150
- python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
151
-
152
- # Node 1:
153
- python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
154
-
155
- # Node 2:
156
- python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
157
-
158
- # Node 3:
159
- python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
160
-
161
- # This is only an example. Please adjust arguments according to your actual environment.
162
  ```
 
 
163
 
164
  - Client:
165
 
166
  ```shell
167
- curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
168
  -H "Content-Type: application/json" \
169
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
170
  ```
@@ -173,26 +216,54 @@ More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.h
173
 
174
  ### vLLM
175
 
176
- For latest guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/projects/recipes/en/latest/inclusionAI/Ring-1T-FP8.html).
177
 
178
  #### Environment Preparation
179
 
 
 
180
  ```bash
181
- pip install vllm==0.11.0
 
 
 
 
182
  ```
183
 
184
- #### Run Inference:
185
 
186
- Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:
 
 
187
 
188
- ```bash
189
- # step 1. start ray on all nodes
190
 
191
- # step 2. start vllm server only on node 0:
192
- vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 4 --gpu-memory-utilization 0.85
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
 
195
- # This is only an example, please adjust arguments according to your actual environment.
 
 
 
 
 
 
 
 
 
196
  ```
197
 
198
  To handle long context in vLLM using YaRN, we need to follow these two steps:
@@ -209,6 +280,8 @@ To handle long context in vLLM using YaRN, we need to follow these two steps:
209
  ```
210
  2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
211
 
 
 
212
 
213
  ## Finetuning
214
 
@@ -234,4 +307,19 @@ Ring-1T@Aworld IMO test trajectory: [https://github.com/inclusionAI/AWorld/tree/
234
 
235
  ## License
236
 
237
- This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ring-V2/blob/master/LICENSE).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  pipeline_tag: text-generation
4
+ library_name: transformers
5
  ---
6
 
7
+ # Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
8
+
9
+ This repository presents **Ring-1T**, an open-source, state-of-the-art thinking model with a trillion-scale parameter, as detailed in the paper [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://huggingface.co/papers/2510.18855). For the full codebase, please refer to the [GitHub repository](https://github.com/inclusionAI/Ring-V2).
10
 
11
  <p align="center">
12
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
 
14
 
15
  <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
16
 
 
 
17
  Today, we officially launch the trillion-parameter thinking model, Ring-1T. It is open-source upon release—developers can download the model weights from Hugging Face and ModelScope, or experience direct chat interactions and API calls via the Ling Chat page and ZenMux (links provided at the end of the article).
18
 
19
  Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
 
37
 
38
  ## Continuously Evolving Deep Reasoning Capabilities
39
 
40
+ To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-Pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-v1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
41
 
42
  <p align="center">
43
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
 
98
 
99
  You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
100
 
101
+ ### 🤗 Hugging Face Transformers
102
+
103
+ Here is a code snippet to show you how to use the chat model with `transformers`:
104
+
105
+ ```python
106
+ from transformers import AutoModelForCausalLM, AutoTokenizer
107
+
108
+ model_name = "inclusionAI/Ring-flash-2.0" # Note: This example uses Ring-flash-2.0, replace with inclusionAI/Ring-1T if desired.
109
+
110
+ model = AutoModelForCausalLM.from_pretrained(
111
+ model_name,
112
+ dtype="auto",
113
+ device_map="auto",
114
+ trust_remote_code=True,
115
+ )
116
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
117
+
118
+ prompt = "Give me a short introduction to large language models."
119
+ messages = [
120
+ {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
121
+ {"role": "user", "content": prompt}
122
+ ]
123
+ text = tokenizer.apply_chat_template(
124
+ messages,
125
+ tokenize=False,
126
+ add_generation_prompt=True
127
+ )
128
+ model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
129
+
130
+ generated_ids = model.generate(
131
+ **model_inputs,
132
+ max_new_tokens=8192
133
+ )
134
+ generated_ids = [
135
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
136
+ ]
137
+
138
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
139
+ ```
140
+
141
  ### 🔌 API Usage
142
 
143
  You can also use Ring-1T through API calls:
 
175
 
176
  #### Environment Preparation
177
 
178
+ We will later submit our model to SGLang official release, now we can prepare the environment following steps:
179
+ ```shell
180
+ pip3 install sglang==0.5.2rc0 sgl-kernel==0.3.7.post1
181
+ ```
182
+ You can use docker image as well:
183
  ```shell
184
+ docker pull lmsysorg/sglang:v0.5.2rc0-cu126
185
+ ```
186
+ Then you should apply patch to sglang installation:
187
+ ```shell
188
+ # patch command is needed, run `yum install -y patch` if needed
189
+ patch -d `python -c 'import sglang;import os; print(os.path.dirname(sglang.__file__))'` -p3 < inference/sglang/bailing_moe_v2.patch
190
  ```
191
 
192
  #### Run Inference
193
 
194
+ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
 
 
195
 
196
  - Start server:
197
+ ```shell
198
+ python -m sglang.launch_server \
199
+ --model-path $MODLE_PATH \
200
+ --host 0.0.0.0 --port $PORT \
201
+ --trust-remote-code \
202
+ --attention-backend fa3
 
 
 
 
 
 
 
 
203
  ```
204
+ MTP is supported for base model, and not yet for chat model. You can add parameter `--speculative-algorithm NEXTN`
205
+ to start command.
206
 
207
  - Client:
208
 
209
  ```shell
210
+ curl -s http://localhost:${PORT}/v1/chat/completions \
211
  -H "Content-Type: application/json" \
212
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
213
  ```
 
216
 
217
  ### vLLM
218
 
219
+ vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
220
 
221
  #### Environment Preparation
222
 
223
+ Since the Pull Request (PR) has not been submitted to the vLLM community at this stage, please prepare the environment by following the steps below:
224
+
225
  ```bash
226
+ git clone -b v0.10.0 https://github.com/vllm-project/vllm.git
227
+ cd vllm
228
+ wget https://raw.githubusercontent.com/inclusionAI/Ring-V2/refs/heads/main/inference/vllm/bailing_moe_v2.patch
229
+ git apply bailing_moe_v2.patch
230
+ pip install -e .
231
  ```
232
 
233
+ #### Offline Inference:
234
 
235
+ ```python
236
+ from transformers import AutoTokenizer
237
+ from vllm import LLM, SamplingParams
238
 
239
+ tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-1T") # Changed from Ring-flash-2.0 for consistency
 
240
 
241
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=16384)
 
242
 
243
+ llm = LLM(model="inclusionAI/Ring-1T", dtype='bfloat16') # Changed from Ring-flash-2.0 for consistency
244
+ prompt = "Give me a short introduction to large language models."
245
+ messages = [
246
+ {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
247
+ {"role": "user", "content": prompt}
248
+ ]
249
+
250
+ text = tokenizer.apply_chat_template(
251
+ messages,
252
+ tokenize=False,
253
+ add_generation_prompt=True
254
+ )
255
+ outputs = llm.generate([text], sampling_params)
256
 
257
+ ```
258
+
259
+ #### Online Inference:
260
+
261
+ ```bash
262
+ vllm serve inclusionAI/Ring-1T \
263
+ --tensor-parallel-size 2 \
264
+ --pipeline-parallel-size 1 \
265
+ --use-v2-block-manager \
266
+ --gpu-memory-utilization 0.90
267
  ```
268
 
269
  To handle long context in vLLM using YaRN, we need to follow these two steps:
 
280
  ```
281
  2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
282
 
283
+ For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
284
+
285
 
286
  ## Finetuning
287
 
 
307
 
308
  ## License
309
 
310
+ This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ring-V2/blob/master/LICENSE).
311
+
312
+ ## Citation
313
+
314
+ If you find our work helpful, feel free to give us a cite.
315
+
316
+ ```
317
+ @inproceedings{lingteam2025ring1t,
318
+ title={Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model},
319
+ author={Ling Team and Anqi Shen and Baihui Li and Bin Hu and Bin Jing and Cai Chen and Chao Huang and Chao Zhang and Chaokun Yang and Cheng Lin and Chengyao Wen and Congqi Li and Deng Zhao and Dingbo Yuan and Donghai You and Fagui Mao and Fanzhuang Meng and Feng Xu and Guojie Li and Guowei Wang and Hao Dai and Haonan Zheng and Hong Liu and Jia Guo and Jiaming Liu and Jian Liu and Jianhao Fu and Jiannan Shi and Jianwen Wang and Jianxin Lai and Jin Yang and Jun Mei and Jun Zhou and Junbo Zhao and Junping Zhao and Kuan Xu and Le Su and Lei Chen and Li Tang and Liang Jiang and Liangcheng Fu and Lianhao Xu and Linfeng Shi and Lisha Liao and Longfei Zheng and Meng Li and Mingchun Chen and Qi Zuo and Qiang Cheng and Qianggang Cao and Qitao Shi and Quanrui Guo and Senlin Zhu and Shaofei Wang and Shaomian Zheng and Shuaicheng Li and Shuwei Gu and Siba Chen and Tao Wu and Tao Zhang and Tianyu Zhang and Tianyu Zhou and Tiwei Bie and Tongkai Yang and Wang Hong and Wang Ren and Weihua Chen and Wenbo Yu and Wengang Zheng and Xiangchun Wang and Xiaodong Yan and Xiaopei Wan and Xin Zhao and Xinyu Kong and Xinyu Tang and Xudong Han and Xudong Wang and Xuemin Yang and Xueyu Hu and Yalin Zhang and Yan Sun and Yicheng Shan and Yilong Wang and Yingying Xu and Yongkang Liu and Yongzhen Guo and Yuanyuan Wang and Yuchen Yan and Yuefan Wang and Yuhong Guo and Zehuan Li and Zhankai Xu and Zhe Li and Zhenduo Zhang and Zhengke Gui and Zhenxuan Pan and Zhenyu Huang and Zhenzhong Lan and Zhiqiang Ding and Zhiqiang Zhang and Zhixun Li and Zhizhen Liu and Zihao Wang and Zujie Wen},
320
+ year={2025},
321
+ eprint={2510.18855},\
322
+ archivePrefix={arXiv},\
323
+ primaryClass={cs.LG}\
324
+ }
325
+ ```