Improve model card: Add `text-generation` pipeline tag, `transformers` library, and paper/citation info

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +78 -27
README.md CHANGED
@@ -1,55 +1,55 @@
1
  ---
2
- license: mit
3
- tags:
4
- - RLinf
5
  language:
6
  - en
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
11
- pipeline_tag: reinforcement-learning
 
 
12
  model-index:
13
  - name: RLinf-math-1.5B
14
  results:
15
  - task:
16
- type: math # Required. Example: automatic-speech-recognition
17
  dataset:
18
- type: aime_2024 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
19
- name: AIME24 # Required. A pretty name for the dataset. Example: Common Voice (French)
20
  metrics:
21
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
22
- value: 48.03125 # Required. Example: 20.90
23
  - task:
24
- type: math # Required. Example: automatic-speech-recognition
25
  dataset:
26
- type: aime_2025 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
27
- name: AIME25 # Required. A pretty name for the dataset. Example: Common Voice (French)
28
  metrics:
29
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
30
- value: 35.10625 # Required. Example: 20.90
31
  - task:
32
- type: stem # Required. Example: automatic-speech-recognition
33
  dataset:
34
- type: gpqa_diamond # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
35
- name: GPQA-diamond # Required. A pretty name for the dataset. Example: Common Voice (French)
36
  metrics:
37
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
38
- value: 37.509375 # Required. Example: 20.90
39
  ---
40
 
 
 
41
  <div align="center">
42
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
43
  </div>
44
 
45
-
46
  <div align="center">
47
- <!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
48
- <!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
49
  <a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
50
  <a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
51
- <!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
52
- <a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a> -->
53
  </div>
54
 
55
  <h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
@@ -100,7 +100,7 @@ We trained and evaluated two models using RLinf:
100
  | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
101
  | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
102
  | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
103
- | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
104
 
105
 
106
 
@@ -129,3 +129,54 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
129
 
130
  ## License
131
  This code repository and the model weights are licensed under the MIT License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
4
  language:
5
  - en
6
+ license: mit
7
  metrics:
8
  - accuracy
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - RLinf
12
+ - reinforcement-learning
13
+ library_name: transformers
14
  model-index:
15
  - name: RLinf-math-1.5B
16
  results:
17
  - task:
18
+ type: math
19
  dataset:
20
+ name: AIME24
21
+ type: aime_2024
22
  metrics:
23
+ - type: accuracy
24
+ value: 48.03125
25
  - task:
26
+ type: math
27
  dataset:
28
+ name: AIME25
29
+ type: aime_2025
30
  metrics:
31
+ - type: accuracy
32
+ value: 35.10625
33
  - task:
34
+ type: stem
35
  dataset:
36
+ name: GPQA-diamond
37
+ type: gpqa_diamond
38
  metrics:
39
+ - type: accuracy
40
+ value: 37.509375
41
  ---
42
 
43
+ This repository contains the **RLinf-math-1.5B** model, which is part of the work presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
44
+
45
  <div align="center">
46
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
47
  </div>
48
 
 
49
  <div align="center">
50
+ <a href="https://huggingface.co/papers/2510.06710"><img src="https://img.shields.io/badge/HuggingFace-Paper-blue?logo=huggingface"></a>
 
51
  <a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
52
  <a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
 
 
53
  </div>
54
 
55
  <h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
 
100
  | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
101
  | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
102
  | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
103
+ | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
104
 
105
 
106
 
 
129
 
130
  ## License
131
  This code repository and the model weights are licensed under the MIT License.
132
+
133
+ ## Citation and Acknowledgement
134
+
135
+ If you find **RLinf** helpful, please cite the paper:
136
+
137
+ ```bibtex
138
+ @misc{yu2025rlinfflexibleefficientlargescale,
139
+ title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
140
+ author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
141
+ year={2025},
142
+ eprint={2509.15965},
143
+ archivePrefix={arXiv},
144
+ primaryClass={cs.LG},
145
+ url={https://arxiv.org/abs/2509.15965},
146
+ }
147
+ ```
148
+
149
+ If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
150
+
151
+ ```bibtex
152
+ @misc{zang2025rlinfvlaunifiedefficientframework,
153
+ title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
154
+ author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
155
+ year={2025},
156
+ eprint={2510.06710},
157
+ archivePrefix={arXiv},
158
+ primaryClass={cs.RO},
159
+ url={https://arxiv.org/abs/2510.06710},
160
+ }
161
+ ```
162
+
163
+ ```bibtex
164
+ @misc{liu2025rlbringvlageneralization,
165
+ title={What Can RL Bring to VLA Generalization? An Empirical Study},
166
+ author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
167
+ year={2025},
168
+ eprint={2505.19789},
169
+ archivePrefix={arXiv},
170
+ primaryClass={cs.LG},
171
+ url={https://arxiv.org/abs/2505.19789},
172
+ }
173
+ ```
174
+
175
+ **Acknowledgements**
176
+ RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.
177
+ In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.
178
+
179
+ **Contact:**
180
+ We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!
181
+ - Chao Yu: zoeyuchao@gmail.com
182
+ - Yu Wang: yu-wang@tsinghua.edu.cn