Improve model card: Add `text-generation` pipeline tag, `transformers` library, and paper/citation info

by nielsr HF Staff - opened Oct 10, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+78

-27

Files changed (1) hide show

README.md +78 -27

README.md CHANGED Viewed

@@ -1,55 +1,55 @@
 ---
-license: mit
-tags:
-- RLinf
 language:
 - en
 metrics:
 - accuracy
-base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
-pipeline_tag: reinforcement-learning
 model-index:
 - name: RLinf-math-1.5B
   results:
   - task:
-      type: math             # Required. Example: automatic-speech-recognition
     dataset:
-      type: aime_2024          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
-      name: AIME24         # Required. A pretty name for the dataset. Example: Common Voice (French)
     metrics:
-      - type: accuracy        # Required. Example: wer. Use metric id from https://hf.co/metrics
-        value: 48.03125      # Required. Example: 20.90
   - task:
-      type: math             # Required. Example: automatic-speech-recognition
     dataset:
-      type: aime_2025          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
-      name: AIME25        # Required. A pretty name for the dataset. Example: Common Voice (French)
     metrics:
-      - type: accuracy        # Required. Example: wer. Use metric id from https://hf.co/metrics
-        value: 35.10625    # Required. Example: 20.90
   - task:
-      type: stem             # Required. Example: automatic-speech-recognition
     dataset:
-      type: gpqa_diamond          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
-      name: GPQA-diamond         # Required. A pretty name for the dataset. Example: Common Voice (French)
     metrics:
-      - type: accuracy        # Required. Example: wer. Use metric id from https://hf.co/metrics
-        value: 37.509375      # Required. Example: 20.90
 ---
 <div align="center">
   <img src="logo.svg" alt="RLinf-logo" width="500"/>
 </div>
 <div align="center">
-<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
-<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
 <a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
 <a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
-<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
-<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a> -->
 </div>
 <h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
@@ -100,7 +100,7 @@ We trained and evaluated two models using RLinf:
 | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B)                           | 66.87     | 52.49     | 44.43        | 54.60     |
 | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview)                    | **68.55** | 51.24     | 43.88        | 54.56     |
 | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B)                   | 67.30     | **55.00** | 45.57        | 55.96     |
-| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B)                            | 68.33     | 52.19     | **48.18**    | **56.23** |
@@ -129,3 +129,54 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ## License
 This code repository and the model weights are licensed under the MIT License.

 ---
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 language:
 - en
+license: mit
 metrics:
 - accuracy
+pipeline_tag: text-generation
+tags:
+- RLinf
+- reinforcement-learning
+library_name: transformers
 model-index:
 - name: RLinf-math-1.5B
   results:
   - task:
+      type: math
     dataset:
+      name: AIME24
+      type: aime_2024
     metrics:
+    - type: accuracy
+      value: 48.03125
   - task:
+      type: math
     dataset:
+      name: AIME25
+      type: aime_2025
     metrics:
+    - type: accuracy
+      value: 35.10625
   - task:
+      type: stem
     dataset:
+      name: GPQA-diamond
+      type: gpqa_diamond
     metrics:
+    - type: accuracy
+      value: 37.509375
 ---
+This repository contains the **RLinf-math-1.5B** model, which is part of the work presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
 <div align="center">
   <img src="logo.svg" alt="RLinf-logo" width="500"/>
 </div>
 <div align="center">
+<a href="https://huggingface.co/papers/2510.06710"><img src="https://img.shields.io/badge/HuggingFace-Paper-blue?logo=huggingface"></a>
 <a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
 <a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
 </div>
 <h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
 | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B)                           | 66.87     | 52.49     | 44.43        | 54.60     |
 | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview)                    | **68.55** | 51.24     | 43.88        | 54.56     |
 | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B)                   | 67.30     | **55.00** | 45.57        | 55.96     |
+| [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-7B)                            | 68.33     | 52.19     | **48.18**    | **56.23** |
 ## License
 This code repository and the model weights are licensed under the MIT License.
+## Citation and Acknowledgement
+If you find **RLinf** helpful, please cite the paper:
+```bibtex
+@misc{yu2025rlinfflexibleefficientlargescale,
+  title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
+  author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
+  year={2025},
+  eprint={2509.15965},
+  archivePrefix={arXiv},
+  primaryClass={cs.LG},
+  url={https://arxiv.org/abs/2509.15965},
+}
+```
+If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
+```bibtex
+@misc{zang2025rlinfvlaunifiedefficientframework,
+      title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
+      author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
+      year={2025},
+      eprint={2510.06710},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2510.06710},
+}
+```
+```bibtex
+@misc{liu2025rlbringvlageneralization,
+  title={What Can RL Bring to VLA Generalization? An Empirical Study},
+  author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
+  year={2025},
+  eprint={2505.19789},
+  archivePrefix={arXiv},
+  primaryClass={cs.LG},
+  url={https://arxiv.org/abs/2505.19789},
+}
+```
+**Acknowledgements**
+RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community.
+In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.
+**Contact:**
+We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!
+- Chao Yu: zoeyuchao@gmail.com
+- Yu Wang: yu-wang@tsinghua.edu.cn