inclusionAI
/

AReaL-boba-2-14B

@@ -1,6 +1,9 @@
 ---
-{}
 ---
 <h1 align="center">
 <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
 </h1>
@@ -9,7 +12,6 @@
 | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
 </p>
 AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
 **AReaL Highlights**
@@ -23,7 +25,7 @@ AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement le
 **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
-**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-Boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
 **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
@@ -37,7 +39,7 @@ In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most im
 + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
-For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](about:blank) for a more comprehensive presentation of our system design.
 ### Overview of Asynchronous RL Training
@@ -92,7 +94,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
 + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
 + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
 + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
-+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
 ### RL Training for Multi-turn Agent
@@ -100,12 +102,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
 In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
-**Multi-turn Agent Learning Curve**
 ## Getting Started
-### Quick Start
 Train Qwen3 1.7B locally:
 ```bash
@@ -215,3 +213,5 @@ We also appreciate all the pioneering works from the community, particularly the
       url={https://arxiv.org/abs/2505.24298},
 }
 ```

 ---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 <h1 align="center">
 <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
 </h1>
 | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
 </p>
 AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
 **AReaL Highlights**
 **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
+**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
 **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
 + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
+For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](https://arxiv.org/pdf/2505.24298).
 ### Overview of Asynchronous RL Training
 + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
 + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
 + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
++ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped-advantage-normalization)
 ### RL Training for Multi-turn Agent
 In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
 ## Getting Started
 Train Qwen3 1.7B locally:
 ```bash
       url={https://arxiv.org/abs/2505.24298},
 }
 ```
+[Github Repository](https://github.com/inclusionAI/AReaL)