zzqsmall commited on
Commit
1271dbf
·
verified ·
1 Parent(s): da68182

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -18,7 +18,7 @@ Today, we officially launch the trillion-parameter thinking model, Ring-1T. It i
18
 
19
  Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
20
 
21
- Ring-1T adopts the Ling 2.0 architecture and is trained on the Ling-1T-base foundation model, which contains 1 trillion total parameters with 50 billion activated parameters, supporting a context window of up to 128K tokens. Leveraging our self-developed icepop reinforcement learning stabilization method and the efficient reinforcement learning system ASystem (whose AReal framework is already open-source), we have achieved smooth scaling of MoE architecture reinforcement learning—from tens of billions (Ring-mini-2.0) to hundreds of billions (Ring-flash-2.0) to trillions (Ring-1T) of parameters—significantly enhancing the model's deep reasoning and natural language inference capabilities.
22
 
23
  ## Model Downloads
24
 
@@ -37,7 +37,7 @@ Note: If you are interested in previous version, please visit the past model col
37
 
38
  ## Continuously Evolving Deep Reasoning Capabilities
39
 
40
- To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source reasoning models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
41
 
42
  <p align="center">
43
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
@@ -89,7 +89,7 @@ Figure 2: Maximum training-inference discrepancy—GRPO shows a significant rise
89
 
90
  To ensure stable and efficient reinforcement learning training for trillion-parameter foundation models, we independently developed a high-performance reinforcement learning system—ASystem. ASystem adopts a SingleController + SPMD architecture. In terms of training and inference engines, it has been meticulously optimized to address memory management and weight exchange challenges specific to trillion-parameter models. Leveraging our self-developed unified memory pool technology for training and inference, it achieves transparent memory offloading, efficiently releases memory fragmentation, and reduces the risk of insufficient memory. Through techniques such as direct P2P communication between GPUs and in-place updates, it enables second-level, zero-redundant model weight exchange.
91
 
92
- For the RL training framework, we built a hybrid reward system based on large-scale Serverless Sandbox technology. This system can start up in milliseconds, supports execution environments for over 10 programming languages, and handles request throughput of up to 10K/s. We have open-sourced AReal and hope to accelerate RL training and research in the open-source community through technological openness.
93
 
94
 
95
  ## Quickstart
@@ -182,7 +182,7 @@ We recommend you to use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory
182
 
183
  ## Limitations and Future Plans
184
 
185
- Ring-1T represents the Bailing team’s first attempt at developing a trillion-scale deep reasoning model. The current version may occasionally exhibit issues such as identity recognition bias, language mixing, and repetitive generation. Additionally, since its attention architecture still adopts the GQA approach from Ling 2.0, there remains room for improvement in reasoning efficiency under long-context scenarios.
186
 
187
  We will continue to optimize these aspects in future releases and highly welcome feedback from the community. Furthermore, training for Ring-1T is still ongoing. We are committed to further unlocking the reasoning potential of this trillion-parameter foundation model and look forward to sharing more mature upgraded versions with everyone as soon as possible.
188
 
 
18
 
19
  Building upon the preview version released at the end of last month, Ring-1T has undergone continued scaling with large-scale verifiable reward reinforcement learning (RLVR) training, further unlocking the natural language reasoning capabilities of the trillion-parameter foundation model. Through RLHF training, the model's general abilities have also been refined, making this release of Ring-1T more balanced in performance across various tasks.
20
 
21
+ Ring-1T adopts the Ling 2.0 architecture and is trained on the Ling-1T-base foundation model, which contains 1 trillion total parameters with 50 billion activated parameters, supporting a context window of up to 128K tokens. Leveraging our self-developed icepop reinforcement learning stabilization method and the efficient reinforcement learning system ASystem (whose AReaL framework is already open-source), we have achieved smooth scaling of MoE architecture reinforcement learning—from tens of billions (Ring-mini-2.0) to hundreds of billions (Ring-flash-2.0) to trillions (Ring-1T) of parameters—significantly enhancing the model's deep reasoning and natural language inference capabilities.
22
 
23
  ## Model Downloads
24
 
 
37
 
38
  ## Continuously Evolving Deep Reasoning Capabilities
39
 
40
+ To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-Pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
41
 
42
  <p align="center">
43
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
 
89
 
90
  To ensure stable and efficient reinforcement learning training for trillion-parameter foundation models, we independently developed a high-performance reinforcement learning system—ASystem. ASystem adopts a SingleController + SPMD architecture. In terms of training and inference engines, it has been meticulously optimized to address memory management and weight exchange challenges specific to trillion-parameter models. Leveraging our self-developed unified memory pool technology for training and inference, it achieves transparent memory offloading, efficiently releases memory fragmentation, and reduces the risk of insufficient memory. Through techniques such as direct P2P communication between GPUs and in-place updates, it enables second-level, zero-redundant model weight exchange.
91
 
92
+ For the RL training framework, we built a hybrid reward system based on large-scale Serverless Sandbox technology. This system can start up in milliseconds, supports execution environments for over 10 programming languages, and handles request throughput of up to 10K/s. We have open-sourced AReaL and hope to accelerate RL training and research in the open-source community through technological openness.
93
 
94
 
95
  ## Quickstart
 
182
 
183
  ## Limitations and Future Plans
184
 
185
+ Ring-1T represents the Bailing team’s first attempt at developing a trillion-scale deep thinking model. The current version may occasionally exhibit issues such as identity recognition bias, language mixing, and repetitive generation. Additionally, since its attention architecture still adopts the GQA approach from Ling 2.0, there remains room for improvement in inference efficiency under long-context scenarios.
186
 
187
  We will continue to optimize these aspects in future releases and highly welcome feedback from the community. Furthermore, training for Ring-1T is still ongoing. We are committed to further unlocking the reasoning potential of this trillion-parameter foundation model and look forward to sharing more mature upgraded versions with everyone as soon as possible.
188