Improve model card: Add `text-generation` pipeline tag, `transformers` library, and paper/citation info
Browse filesThis PR improves the model card for `RLinf-math-1.5B` by:
* **Refining the `pipeline_tag`**: Changed from `reinforcement-learning` to `text-generation` to accurately reflect the model's primary function of generating text for mathematical reasoning tasks.
* **Adding `library_name`**: Included `transformers` to enable the automated "Use in Transformers" widget, as evidenced by the provided sample usage.
* **Updating `tags`**: Added `reinforcement-learning` to the `tags` list to ensure discoverability based on the model's training methodology, complementing the `RLinf` tag.
* **Adding Paper Link**: Included a direct link to the associated paper, [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710), at the top of the card.
* **Adding Citation Section**: Appended the "Citation and Acknowledgement" section from the GitHub README for proper attribution and academic visibility.
These changes will enhance the model's discoverability and provide clearer, more comprehensive information for users.
|
@@ -1,55 +1,55 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
- RLinf
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
| 12 |
model-index:
|
| 13 |
- name: RLinf-math-1.5B
|
| 14 |
results:
|
| 15 |
- task:
|
| 16 |
-
type: math
|
| 17 |
dataset:
|
| 18 |
-
|
| 19 |
-
|
| 20 |
metrics:
|
| 21 |
-
|
| 22 |
-
|
| 23 |
- task:
|
| 24 |
-
type: math
|
| 25 |
dataset:
|
| 26 |
-
|
| 27 |
-
|
| 28 |
metrics:
|
| 29 |
-
|
| 30 |
-
|
| 31 |
- task:
|
| 32 |
-
type: stem
|
| 33 |
dataset:
|
| 34 |
-
|
| 35 |
-
|
| 36 |
metrics:
|
| 37 |
-
|
| 38 |
-
|
| 39 |
---
|
| 40 |
|
|
|
|
|
|
|
| 41 |
<div align="center">
|
| 42 |
<img src="logo.svg" alt="RLinf-logo" width="500"/>
|
| 43 |
</div>
|
| 44 |
|
| 45 |
-
|
| 46 |
<div align="center">
|
| 47 |
-
<
|
| 48 |
-
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
|
| 49 |
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
|
| 50 |
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
|
| 51 |
-
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
|
| 52 |
-
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&"></a> -->
|
| 53 |
</div>
|
| 54 |
|
| 55 |
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
|
|
@@ -100,7 +100,7 @@ We trained and evaluated two models using RLinf:
|
|
| 100 |
| [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
|
| 101 |
| [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
|
| 102 |
| [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
|
| 103 |
-
| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-
|
| 104 |
|
| 105 |
|
| 106 |
|
|
@@ -129,3 +129,54 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 129 |
|
| 130 |
## License
|
| 131 |
This code repository and the model weights are licensed under the MIT License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
|
|
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
+
license: mit
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
tags:
|
| 11 |
+
- RLinf
|
| 12 |
+
- reinforcement-learning
|
| 13 |
+
library_name: transformers
|
| 14 |
model-index:
|
| 15 |
- name: RLinf-math-1.5B
|
| 16 |
results:
|
| 17 |
- task:
|
| 18 |
+
type: math
|
| 19 |
dataset:
|
| 20 |
+
name: AIME24
|
| 21 |
+
type: aime_2024
|
| 22 |
metrics:
|
| 23 |
+
- type: accuracy
|
| 24 |
+
value: 48.03125
|
| 25 |
- task:
|
| 26 |
+
type: math
|
| 27 |
dataset:
|
| 28 |
+
name: AIME25
|
| 29 |
+
type: aime_2025
|
| 30 |
metrics:
|
| 31 |
+
- type: accuracy
|
| 32 |
+
value: 35.10625
|
| 33 |
- task:
|
| 34 |
+
type: stem
|
| 35 |
dataset:
|
| 36 |
+
name: GPQA-diamond
|
| 37 |
+
type: gpqa_diamond
|
| 38 |
metrics:
|
| 39 |
+
- type: accuracy
|
| 40 |
+
value: 37.509375
|
| 41 |
---
|
| 42 |
|
| 43 |
+
This repository contains the **RLinf-math-1.5B** model, which is part of the work presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
|
| 44 |
+
|
| 45 |
<div align="center">
|
| 46 |
<img src="logo.svg" alt="RLinf-logo" width="500"/>
|
| 47 |
</div>
|
| 48 |
|
|
|
|
| 49 |
<div align="center">
|
| 50 |
+
<a href="https://huggingface.co/papers/2510.06710"><img src="https://img.shields.io/badge/HuggingFace-Paper-blue?logo=huggingface"></a>
|
|
|
|
| 51 |
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
|
| 52 |
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
|
|
|
|
|
|
|
| 53 |
</div>
|
| 54 |
|
| 55 |
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
|
|
|
|
| 100 |
| [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
|
| 101 |
| [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
|
| 102 |
| [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
|
| 103 |
+
| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
|
| 104 |
|
| 105 |
|
| 106 |
|
|
|
|
| 129 |
|
| 130 |
## License
|
| 131 |
This code repository and the model weights are licensed under the MIT License.
|
| 132 |
+
|
| 133 |
+
## Citation and Acknowledgement
|
| 134 |
+
|
| 135 |
+
If you find **RLinf** helpful, please cite the paper:
|
| 136 |
+
|
| 137 |
+
```bibtex
|
| 138 |
+
@misc{yu2025rlinfflexibleefficientlargescale,
|
| 139 |
+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
|
| 140 |
+
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
|
| 141 |
+
year={2025},
|
| 142 |
+
eprint={2509.15965},
|
| 143 |
+
archivePrefix={arXiv},
|
| 144 |
+
primaryClass={cs.LG},
|
| 145 |
+
url={https://arxiv.org/abs/2509.15965},
|
| 146 |
+
}
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
|
| 150 |
+
|
| 151 |
+
```bibtex
|
| 152 |
+
@misc{zang2025rlinfvlaunifiedefficientframework,
|
| 153 |
+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
|
| 154 |
+
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
|
| 155 |
+
year={2025},
|
| 156 |
+
eprint={2510.06710},
|
| 157 |
+
archivePrefix={arXiv},
|
| 158 |
+
primaryClass={cs.RO},
|
| 159 |
+
url={https://arxiv.org/abs/2510.06710},
|
| 160 |
+
}
|
| 161 |
+
```
|
| 162 |
+
|
| 163 |
+
```bibtex
|
| 164 |
+
@misc{liu2025rlbringvlageneralization,
|
| 165 |
+
title={What Can RL Bring to VLA Generalization? An Empirical Study},
|
| 166 |
+
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
|
| 167 |
+
year={2025},
|
| 168 |
+
eprint={2505.19789},
|
| 169 |
+
archivePrefix={arXiv},
|
| 170 |
+
primaryClass={cs.LG},
|
| 171 |
+
url={https://arxiv.org/abs/2505.19789},
|
| 172 |
+
}
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
**Acknowledgements**
|
| 176 |
+
RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.
|
| 177 |
+
In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.
|
| 178 |
+
|
| 179 |
+
**Contact:**
|
| 180 |
+
We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!
|
| 181 |
+
- Chao Yu: zoeyuchao@gmail.com
|
| 182 |
+
- Yu Wang: yu-wang@tsinghua.edu.cn
|