Improve model card: Add `text-generation` pipeline tag, `transformers` library, and paper/citation info
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,55 +1,55 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
- RLinf
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
| 12 |
model-index:
|
| 13 |
- name: RLinf-math-1.5B
|
| 14 |
results:
|
| 15 |
- task:
|
| 16 |
-
type: math
|
| 17 |
dataset:
|
| 18 |
-
|
| 19 |
-
|
| 20 |
metrics:
|
| 21 |
-
|
| 22 |
-
|
| 23 |
- task:
|
| 24 |
-
type: math
|
| 25 |
dataset:
|
| 26 |
-
|
| 27 |
-
|
| 28 |
metrics:
|
| 29 |
-
|
| 30 |
-
|
| 31 |
- task:
|
| 32 |
-
type: stem
|
| 33 |
dataset:
|
| 34 |
-
|
| 35 |
-
|
| 36 |
metrics:
|
| 37 |
-
|
| 38 |
-
|
| 39 |
---
|
| 40 |
|
|
|
|
|
|
|
| 41 |
<div align="center">
|
| 42 |
<img src="logo.svg" alt="RLinf-logo" width="500"/>
|
| 43 |
</div>
|
| 44 |
|
| 45 |
-
|
| 46 |
<div align="center">
|
| 47 |
-
|
| 48 |
-
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
|
| 49 |
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
|
| 50 |
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
|
| 51 |
-
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
|
| 52 |
-
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&"></a> -->
|
| 53 |
</div>
|
| 54 |
|
| 55 |
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
|
|
@@ -100,7 +100,7 @@ We trained and evaluated two models using RLinf:
|
|
| 100 |
| [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
|
| 101 |
| [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
|
| 102 |
| [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
|
| 103 |
-
| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-
|
| 104 |
|
| 105 |
|
| 106 |
|
|
@@ -129,3 +129,54 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 129 |
|
| 130 |
## License
|
| 131 |
This code repository and the model weights are licensed under the MIT License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
|
|
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
+
license: mit
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
tags:
|
| 11 |
+
- RLinf
|
| 12 |
+
- reinforcement-learning
|
| 13 |
+
library_name: transformers
|
| 14 |
model-index:
|
| 15 |
- name: RLinf-math-1.5B
|
| 16 |
results:
|
| 17 |
- task:
|
| 18 |
+
type: math
|
| 19 |
dataset:
|
| 20 |
+
name: AIME24
|
| 21 |
+
type: aime_2024
|
| 22 |
metrics:
|
| 23 |
+
- type: accuracy
|
| 24 |
+
value: 48.03125
|
| 25 |
- task:
|
| 26 |
+
type: math
|
| 27 |
dataset:
|
| 28 |
+
name: AIME25
|
| 29 |
+
type: aime_2025
|
| 30 |
metrics:
|
| 31 |
+
- type: accuracy
|
| 32 |
+
value: 35.10625
|
| 33 |
- task:
|
| 34 |
+
type: stem
|
| 35 |
dataset:
|
| 36 |
+
name: GPQA-diamond
|
| 37 |
+
type: gpqa_diamond
|
| 38 |
metrics:
|
| 39 |
+
- type: accuracy
|
| 40 |
+
value: 37.509375
|
| 41 |
---
|
| 42 |
|
| 43 |
+
This repository contains the **RLinf-math-1.5B** model, which is part of the work presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
|
| 44 |
+
|
| 45 |
<div align="center">
|
| 46 |
<img src="logo.svg" alt="RLinf-logo" width="500"/>
|
| 47 |
</div>
|
| 48 |
|
|
|
|
| 49 |
<div align="center">
|
| 50 |
+
<a href="https://huggingface.co/papers/2510.06710"><img src="https://img.shields.io/badge/HuggingFace-Paper-blue?logo=huggingface"></a>
|
|
|
|
| 51 |
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
|
| 52 |
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
|
|
|
|
|
|
|
| 53 |
</div>
|
| 54 |
|
| 55 |
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
|
|
|
|
| 100 |
| [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
|
| 101 |
| [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
|
| 102 |
| [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
|
| 103 |
+
| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
|
| 104 |
|
| 105 |
|
| 106 |
|
|
|
|
| 129 |
|
| 130 |
## License
|
| 131 |
This code repository and the model weights are licensed under the MIT License.
|
| 132 |
+
|
| 133 |
+
## Citation and Acknowledgement
|
| 134 |
+
|
| 135 |
+
If you find **RLinf** helpful, please cite the paper:
|
| 136 |
+
|
| 137 |
+
```bibtex
|
| 138 |
+
@misc{yu2025rlinfflexibleefficientlargescale,
|
| 139 |
+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
|
| 140 |
+
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
|
| 141 |
+
year={2025},
|
| 142 |
+
eprint={2509.15965},
|
| 143 |
+
archivePrefix={arXiv},
|
| 144 |
+
primaryClass={cs.LG},
|
| 145 |
+
url={https://arxiv.org/abs/2509.15965},
|
| 146 |
+
}
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
|
| 150 |
+
|
| 151 |
+
```bibtex
|
| 152 |
+
@misc{zang2025rlinfvlaunifiedefficientframework,
|
| 153 |
+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
|
| 154 |
+
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
|
| 155 |
+
year={2025},
|
| 156 |
+
eprint={2510.06710},
|
| 157 |
+
archivePrefix={arXiv},
|
| 158 |
+
primaryClass={cs.RO},
|
| 159 |
+
url={https://arxiv.org/abs/2510.06710},
|
| 160 |
+
}
|
| 161 |
+
```
|
| 162 |
+
|
| 163 |
+
```bibtex
|
| 164 |
+
@misc{liu2025rlbringvlageneralization,
|
| 165 |
+
title={What Can RL Bring to VLA Generalization? An Empirical Study},
|
| 166 |
+
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
|
| 167 |
+
year={2025},
|
| 168 |
+
eprint={2505.19789},
|
| 169 |
+
archivePrefix={arXiv},
|
| 170 |
+
primaryClass={cs.LG},
|
| 171 |
+
url={https://arxiv.org/abs/2505.19789},
|
| 172 |
+
}
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
**Acknowledgements**
|
| 176 |
+
RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.
|
| 177 |
+
In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.
|
| 178 |
+
|
| 179 |
+
**Contact:**
|
| 180 |
+
We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!
|
| 181 |
+
- Chao Yu: zoeyuchao@gmail.com
|
| 182 |
+
- Yu Wang: yu-wang@tsinghua.edu.cn
|