Add pipeline tag, library name, license and link to Github repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
- {}
 
 
3
  ---
 
4
  <h1 align="center">
5
  <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
6
  </h1>
@@ -9,7 +12,6 @@
9
  | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
10
  </p>
11
 
12
-
13
  AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
14
 
15
  **AReaL Highlights**
@@ -23,7 +25,7 @@ AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement le
23
 
24
  **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
25
 
26
- **[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-Boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
27
 
28
  **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
29
 
@@ -37,7 +39,7 @@ In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most im
37
 
38
  + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
39
 
40
- For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](about:blank) for a more comprehensive presentation of our system design.
41
 
42
  ### Overview of Asynchronous RL Training
43
 
@@ -92,7 +94,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
92
  + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
93
  + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
94
  + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
95
- + [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
96
 
97
  ### RL Training for Multi-turn Agent
98
 
@@ -100,12 +102,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
100
 
101
  In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
102
 
103
- **Multi-turn Agent Learning Curve**
104
-
105
  ## Getting Started
106
 
107
- ### Quick Start
108
-
109
  Train Qwen3 1.7B locally:
110
 
111
  ```bash
@@ -215,3 +213,5 @@ We also appreciate all the pioneering works from the community, particularly the
215
  url={https://arxiv.org/abs/2505.24298},
216
  }
217
  ```
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
+
7
  <h1 align="center">
8
  <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
9
  </h1>
 
12
  | <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
13
  </p>
14
 
 
15
  AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
16
 
17
  **AReaL Highlights**
 
25
 
26
  **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
27
 
28
+ **[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
29
 
30
  **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
31
 
 
39
 
40
  + Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
41
 
42
+ For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](https://arxiv.org/pdf/2505.24298).
43
 
44
  ### Overview of Asynchronous RL Training
45
 
 
94
  + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
95
  + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
96
  + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
97
+ + [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped-advantage-normalization)
98
 
99
  ### RL Training for Multi-turn Agent
100
 
 
102
 
103
  In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
104
 
 
 
105
  ## Getting Started
106
 
 
 
107
  Train Qwen3 1.7B locally:
108
 
109
  ```bash
 
213
  url={https://arxiv.org/abs/2505.24298},
214
  }
215
  ```
216
+
217
+ [Github Repository](https://github.com/inclusionAI/AReaL)