Improve model card: Add pipeline tag, library name, update title and synchronize content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -17
README.md CHANGED
@@ -1,39 +1,41 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  datasets:
6
  - tanhuajie2001/Reason-RFT-CoT-Dataset
 
 
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - Qwen/Qwen2-VL-2B-Instruct
11
  ---
12
 
13
  <div align="center">
14
  <img src="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/logo.png" width="500"/>
15
  </div>
16
 
17
- # 🤗 Reason-RFT CoT Dateset
18
- *The model checkpoints in our project "Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning"*.
19
 
 
20
 
21
  <p align="center">
22
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
23
  </p>
24
 
25
  <p align="center">
26
- </a>&nbsp&nbsp🤖 <a href="https://github.com/FlagOpen/RoboBrain/">RoboBrain</a>: Aim to Explore ReasonRFT Paradigm to Enhance RoboBrain's Embodied Reasoning Capabilities.
27
  </p>
28
 
29
- ## ♣️ Model List
30
 
31
  | Tasks | Reason-RFT-Zero-2B | Reason-RFT-Zero-7B | Reason-RFT-2B | Reason-RFT-7B |
32
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
33
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
34
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
35
- | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
36
- | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
37
 
38
 
39
  ## 🔥 Overview
@@ -43,9 +45,9 @@ However, this training paradigm may lead to overfitting and cognitive rigidity,
43
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
44
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
45
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
46
- Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
47
- **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
48
- **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
49
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
50
 
51
  <div align="center">
@@ -53,8 +55,9 @@ Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Per
53
  </div>
54
 
55
  ## 🗞️ News
56
-
57
- - **`2025-04-12`**: ⭐️ We released our [Models](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
 
58
  - **`2025-04-04`**: 🤗 We released our [datasets](https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset/) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
59
  - **`2025-04-02`**: 🔥 We released codes and scripts for training/evaluation on [General Visual Reasoning Tasks](#GeneralVisualTasks).
60
  - **`2025-03-29`**: 🌍 We released the [repository](https://github.com/tanhuajie/Reason-RFT/) and [roadmap](#RoadMap) for **Reason-RFT**.
@@ -74,4 +77,17 @@ If you find this project useful, welcome to cite us.
74
  journal={arXiv preprint arXiv:2503.20752},
75
  year={2025}
76
  }
77
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-2B-Instruct
 
4
  datasets:
5
  - tanhuajie2001/Reason-RFT-CoT-Dataset
6
+ language:
7
+ - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
11
+ pipeline_tag: image-text-to-text
12
+ library_name: transformers
13
  ---
14
 
15
  <div align="center">
16
  <img src="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/logo.png" width="500"/>
17
  </div>
18
 
19
+ # Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
 
20
 
21
+ This model was presented in the paper [Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models](https://huggingface.co/papers/2503.20752).
22
 
23
  <p align="center">
24
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
25
  </p>
26
 
27
  <p align="center">
28
+ </a>&nbsp&nbsp🤖 <a href="https://github.com/FlagOpen/RoboBrain2.0/">RoboBrain 2.0</a>: Explore Embodied Visual Reasoning Capabilities with **Reason-RFT**.
29
  </p>
30
 
31
+ ## 🤗 Model Zoo
32
 
33
  | Tasks | Reason-RFT-Zero-2B | Reason-RFT-Zero-7B | Reason-RFT-2B | Reason-RFT-7B |
34
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
35
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
36
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
37
+ | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
38
+ | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
39
 
40
 
41
  ## 🔥 Overview
 
45
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
46
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
47
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
48
+ Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
49
+ **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
50
+ **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
51
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
52
 
53
  <div align="center">
 
55
  </div>
56
 
57
  ## 🗞️ News
58
+ - **`2025-09-18`**: 🔥🔥🔥 **Reason-RFT** gets accepted to NeurIPS 2025! See you in Mexico City and San Diego, USA!
59
+ - **`2025-06-06`**: 🤖 We're excited to announce the release of our more powerful [RoboBrain 2.0](https://github.com/FlagOpen/RoboBrain2.0) using Reason-RFT.
60
+ - **`2025-04-13`**: ✨ We released our [model zoo](https://github.com/tanhuajie/Reason-RFT?tab=readme-ov-file#--model-zoo) to huggingface.
61
  - **`2025-04-04`**: 🤗 We released our [datasets](https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset/) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
62
  - **`2025-04-02`**: 🔥 We released codes and scripts for training/evaluation on [General Visual Reasoning Tasks](#GeneralVisualTasks).
63
  - **`2025-03-29`**: 🌍 We released the [repository](https://github.com/tanhuajie/Reason-RFT/) and [roadmap](#RoadMap) for **Reason-RFT**.
 
77
  journal={arXiv preprint arXiv:2503.20752},
78
  year={2025}
79
  }
80
+
81
+ @article{team2025robobrain,
82
+ title={Robobrain 2.0 technical report},
83
+ author={Team, BAAI RoboBrain and Cao, Mingyu and Tan, Huajie and Ji, Yuheng and Lin, Minglan and Li, Zhiyu and Cao, Zhou and Wang, Pengwei and Zhou, Enshen and Han, Yi and others},
84
+ journal={arXiv preprint arXiv:2507.02029},
85
+ year={2025}
86
+ }
87
+
88
+ @article{ji2025robobrain,
89
+ title={RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete},
90
+ author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
91
+ journal={arXiv preprint arXiv:2502.21257},
92
+ year={2025}
93
+ }