Robotics
Safetensors
nielsr HF Staff commited on
Commit
c2e6696
·
verified ·
1 Parent(s): 085e5c6

Add model card for $\pi_\\texttt{RL}$: Online RL Fine-tuning Robotics Model

Browse files

This PR adds a comprehensive model card for the `$\pi_\\texttt{RL}$` model, presented in the paper "[$\pi_\\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models](https://huggingface.co/papers/2510.25889)".

It includes:
* The `robotics` pipeline tag for improved discoverability.
* The `apache-2.0` license, consistent with the `RLinf` project.
* A link to the official GitHub repository: `https://github.com/RLinf/RLinf`.
* A link to the RLinf project documentation: `https://rlinf.readthedocs.io/en/latest/`.
* The paper's abstract and relevant citations.

Please review and merge if everything looks good!

Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: robotics
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # $\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
7
+
8
+ This repository contains artifacts related to the `$\pi_\\texttt{RL}$` framework, as introduced in the paper [$\pi_\\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models](https://huggingface.co/papers/2510.25889).
9
+
10
+ The `$\pi_\\texttt{RL}$` framework is an open-source solution for training flow-based Vision-Language-Action (VLA) models in parallel simulation, addressing the challenges of applying large-scale reinforcement learning to systems with intractable action log-likelihoods from iterative denoising.
11
+
12
+ ## Abstract
13
+ Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying large-scale RL to flow-based VLAs (e.g., $\pi_0$, $\pi_{0.5}$) remains challenging due to intractable action log-likelihoods from iterative denoising. We address this challenge with $\pi_{\text{RL}}$, an open-source framework for training flow-based VLAs in parallel simulation. $\pi_{\text{RL}}$ implements two RL algorithms: (1) {Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) {Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration. We evaluate $\pi_{\text{RL}}$ on LIBERO and ManiSkill benchmarks. On LIBERO, $\pi_{\text{RL}}$ boosts few-shot SFT models $\pi_0$ and $\pi_{0.5}$ from 57.6% to 97.6% and from 77.1% to 98.3%, respectively. In ManiSkill, we train $\pi_{\text{RL}}$ in 320 parallel environments, improving $\pi_0$ from 41.6% to 85.7% and $\pi_{0.5}$ from 40.0% to 84.8% across 4352 pick-and-place tasks, demonstrating scalable multitask RL under heterogeneous simulation. Overall, $\pi_{\text{RL}}$ achieves significant performance gains and stronger generalization over SFT-models, validating the effectiveness of online RL for flow-based VLAs.
14
+
15
+ ## Further Resources
16
+ * **Paper**: [$\pi_\\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models](https://huggingface.co/papers/2510.25889)
17
+ * **Code**: https://github.com/RLinf/RLinf
18
+ * **RLinf Project Documentation**: https://rlinf.readthedocs.io/en/latest/
19
+
20
+ ## Citation
21
+ If you find this work helpful, please cite the following papers:
22
+
23
+ ```bibtex
24
+ @misc{chen2025pitextttrlonlinerlfinetuning,
25
+ title={$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
26
+ author={Kang Chen and Zhihao Liu and Tonghe Zhang and Zhen Guo and Si Xu and Hao Lin and Hongzhi Zang and Quanlu Zhang and Zhaofei Yu and Guoliang Fan and Tiejun Huang and Yu Wang and Chao Yu},
27
+ year={2025},
28
+ eprint={2510.25889},
29
+ archivePrefix={arXiv},
30
+ primaryClass={cs.LG},
31
+ url={https://arxiv.org/abs/2510.25889},
32
+ }
33
+ ```
34
+ If you use the broader RLinf framework, please also cite its main paper:
35
+ ```bibtex
36
+ @misc{yu2025rlinfflexibleefficientlargescale,
37
+ title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
38
+ author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
39
+ year={2025},
40
+ eprint={2509.15965},
41
+ archivePrefix={arXiv},
42
+ primaryClass={cs.LG},
43
+ url={https://arxiv.org/abs/2509.15965},
44
+ }
45
+ ```