Update model card: Add library_name, paper/code links, transformers usage, and deployment info

This PR significantly enhances the model card for the Ring-1T model by:

* **Adding `library_name: transformers` to the metadata**: This enables the automated "how to use" widget on the Hugging Face Hub, providing users with automated code snippets for easy integration with the `transformers` library.
* **Aligning the main title of the model card** with the official paper title: "Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model".
* **Including a direct link to the Hugging Face paper page**: [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://huggingface.co/papers/2510.18855) in the introductory section.
* **Adding a prominent link to the GitHub repository**: [https://github.com/inclusionAI/Ring-V2](https://github.com/inclusionAI/Ring-V2) for quick access to the code.
* **Integrating a `transformers` code snippet** for quick model usage, as found in the original GitHub README, under the Quickstart section.
* **Updating the SGLang and vLLM deployment sections** with more comprehensive environment preparation and usage instructions from the GitHub repository.
* **Adding the BibTeX citation** for the paper.

These updates collectively improve the discoverability, usability, and completeness of the model card on the Hugging Face Hub.

Files changed (1) hide show

README.md +121 -33

README.md CHANGED Viewed

@@ -1,8 +1,12 @@
 ---
 license: mit
 pipeline_tag: text-generation
 ---
 <p align="center">
     <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
@@ -10,8 +14,6 @@ pipeline_tag: text-generation
 <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
-# Ring-1T, flow state leads to sudden enlightenment
 Today, we officially launch the trillion-parameter thinking model, Ring-1T. It is open-source upon release—developers can download the model weights from Hugging Face and ModelScope, or experience direct chat interactions and API calls via the Ling Chat page and ZenMux (links provided at the end of the article).
 Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
@@ -35,7 +37,7 @@ Note: If you are interested in previous version, please visit the past model col
 ## Continuously Evolving Deep Reasoning Capabilities
-To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-Pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
 <p align="center">
     <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
@@ -96,6 +98,46 @@ For the RL training framework, we built a hybrid reward system based on large-sc
 You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
 ### 🔌 API Usage
 You can also use Ring-1T through API calls:
@@ -133,38 +175,39 @@ print(completion.choices[0].message.content)
 #### Environment Preparation
-We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
 ```shell
-pip3 install -U sglang sgl-kernel
 ```
 #### Run Inference
-Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
-Here is the example to run Ring-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
 - Start server:
-```bash
-# Node 0:
-python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
-# Node 1:
-python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
-# Node 2:
-python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
-# Node 3:
-python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
-# This is only an example. Please adjust arguments according to your actual environment.
 ```
 - Client:
 ```shell
-curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
 ```
@@ -173,26 +216,54 @@ More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.h
 ### vLLM
-For latest guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/projects/recipes/en/latest/inclusionAI/Ring-1T-FP8.html).
 #### Environment Preparation
 ```bash
-pip install vllm==0.11.0
 ```
-#### Run Inference:
-Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:
-```bash
-# step 1. start ray on all nodes
-# step 2. start vllm server only on node 0:
-vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 4 --gpu-memory-utilization 0.85
-# This is only an example, please adjust arguments according to your actual environment.
 ```
 To handle long context in vLLM using YaRN, we need to follow these two steps:
@@ -209,6 +280,8 @@ To handle long context in vLLM using YaRN, we need to follow these two steps:
 ```
 2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
 ## Finetuning
@@ -234,4 +307,19 @@ Ring-1T@Aworld IMO test trajectory: [https://github.com/inclusionAI/AWorld/tree/
 ## License
-This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ring-V2/blob/master/LICENSE).

 ---
 license: mit
 pipeline_tag: text-generation
+library_name: transformers
 ---
+# Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
+This repository presents **Ring-1T**, an open-source, state-of-the-art thinking model with a trillion-scale parameter, as detailed in the paper [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://huggingface.co/papers/2510.18855). For the full codebase, please refer to the [GitHub repository](https://github.com/inclusionAI/Ring-V2).
 <p align="center">
     <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
 <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
 Today, we officially launch the trillion-parameter thinking model, Ring-1T. It is open-source upon release—developers can download the model weights from Hugging Face and ModelScope, or experience direct chat interactions and API calls via the Ling Chat page and ZenMux (links provided at the end of the article).
 Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
 ## Continuously Evolving Deep Reasoning Capabilities
+To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-Pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-v1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
 <p align="center">
     <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
 You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
+### 🤗 Hugging Face Transformers
+Here is a code snippet to show you how to use the chat model with `transformers`:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "inclusionAI/Ring-flash-2.0" # Note: This example uses Ring-flash-2.0, replace with inclusionAI/Ring-1T if desired.
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    dtype="auto",
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language models."
+messages = [
+    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=8192
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
 ### 🔌 API Usage
 You can also use Ring-1T through API calls:
 #### Environment Preparation
+We will later submit our model to SGLang official release, now we can prepare the environment following steps:
+```shell
+pip3 install sglang==0.5.2rc0 sgl-kernel==0.3.7.post1
+```
+You can use docker image as well:
 ```shell
+docker pull lmsysorg/sglang:v0.5.2rc0-cu126
+```
+Then you should apply patch to sglang installation:
+```shell
+# patch command is needed, run `yum install -y patch` if needed
+patch -d `python -c 'import sglang;import os; print(os.path.dirname(sglang.__file__))'` -p3 < inference/sglang/bailing_moe_v2.patch
 ```
 #### Run Inference
+BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
 - Start server:
+```shell
+python -m sglang.launch_server \
+    --model-path $MODLE_PATH \
+    --host 0.0.0.0 --port $PORT \
+    --trust-remote-code \
+    --attention-backend fa3
 ```
+MTP is supported for base model, and not yet for chat model. You can add parameter `--speculative-algorithm NEXTN`
+to start command.
 - Client:
 ```shell
+curl -s http://localhost:${PORT}/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
 ```
 ### vLLM
+vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
 #### Environment Preparation
+Since the Pull Request (PR) has not been submitted to the vLLM community at this stage, please prepare the environment by following the steps below:
 ```bash
+git clone -b v0.10.0 https://github.com/vllm-project/vllm.git
+cd vllm
+wget https://raw.githubusercontent.com/inclusionAI/Ring-V2/refs/heads/main/inference/vllm/bailing_moe_v2.patch
+git apply bailing_moe_v2.patch
+pip install -e .
 ```
+#### Offline Inference:
+```python
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-1T") # Changed from Ring-flash-2.0 for consistency
+sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=16384)
+llm = LLM(model="inclusionAI/Ring-1T", dtype='bfloat16') # Changed from Ring-flash-2.0 for consistency
+prompt = "Give me a short introduction to large language models."
+messages = [
+    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+outputs = llm.generate([text], sampling_params)
+```
+#### Online Inference:
+```bash
+vllm serve inclusionAI/Ring-1T \
+              --tensor-parallel-size 2 \
+              --pipeline-parallel-size 1 \
+              --use-v2-block-manager \
+              --gpu-memory-utilization 0.90
 ```
 To handle long context in vLLM using YaRN, we need to follow these two steps:
 ```
 2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
+For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
 ## Finetuning
 ## License
+This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ring-V2/blob/master/LICENSE).
+## Citation
+If you find our work helpful, feel free to give us a cite.
+```
+@inproceedings{lingteam2025ring1t,
+      title={Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model},
+      author={Ling Team and Anqi Shen and Baihui Li and Bin Hu and Bin Jing and Cai Chen and Chao Huang and Chao Zhang and Chaokun Yang and Cheng Lin and Chengyao Wen and Congqi Li and Deng Zhao and Dingbo Yuan and Donghai You and Fagui Mao and Fanzhuang Meng and Feng Xu and Guojie Li and Guowei Wang and Hao Dai and Haonan Zheng and Hong Liu and Jia Guo and Jiaming Liu and Jian Liu and Jianhao Fu and Jiannan Shi and Jianwen Wang and Jianxin Lai and Jin Yang and Jun Mei and Jun Zhou and Junbo Zhao and Junping Zhao and Kuan Xu and Le Su and Lei Chen and Li Tang and Liang Jiang and Liangcheng Fu and Lianhao Xu and Linfeng Shi and Lisha Liao and Longfei Zheng and Meng Li and Mingchun Chen and Qi Zuo and Qiang Cheng and Qianggang Cao and Qitao Shi and Quanrui Guo and Senlin Zhu and Shaofei Wang and Shaomian Zheng and Shuaicheng Li and Shuwei Gu and Siba Chen and Tao Wu and Tao Zhang and Tianyu Zhang and Tianyu Zhou and Tiwei Bie and Tongkai Yang and Wang Hong and Wang Ren and Weihua Chen and Wenbo Yu and Wengang Zheng and Xiangchun Wang and Xiaodong Yan and Xiaopei Wan and Xin Zhao and Xinyu Kong and Xinyu Tang and Xudong Han and Xudong Wang and Xuemin Yang and Xueyu Hu and Yalin Zhang and Yan Sun and Yicheng Shan and Yilong Wang and Yingying Xu and Yongkang Liu and Yongzhen Guo and Yuanyuan Wang and Yuchen Yan and Yuefan Wang and Yuhong Guo and Zehuan Li and Zhankai Xu and Zhe Li and Zhenduo Zhang and Zhengke Gui and Zhenxuan Pan and Zhenyu Huang and Zhenzhong Lan and Zhiqiang Ding and Zhiqiang Zhang and Zhixun Li and Zhizhen Liu and Zihao Wang and Zujie Wen},
+      year={2025},
+      eprint={2510.18855},\
+      archivePrefix={arXiv},\
+      primaryClass={cs.LG}\
+}
+```