Add pipeline_tag, library_name, paper link, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +58 -14
README.md CHANGED
@@ -1,13 +1,15 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  datasets:
6
  - tanhuajie2001/Reason-RFT-CoT-Dataset
 
 
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - Qwen/Qwen2-VL-2B-Instruct
11
  ---
12
 
13
  <div align="center">
@@ -17,6 +19,7 @@ base_model:
17
  # 🤗 Reason-RFT CoT Dateset
18
  *The model checkpoints in our project "Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning"*.
19
 
 
20
 
21
  <p align="center">
22
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
@@ -32,8 +35,8 @@ base_model:
32
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
33
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
34
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
35
- | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
36
- | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
37
 
38
 
39
  ## 🔥 Overview
@@ -43,9 +46,9 @@ However, this training paradigm may lead to overfitting and cognitive rigidity,
43
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
44
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
45
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
46
- Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
47
- **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
48
- **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
49
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
50
 
51
  <div align="center">
@@ -53,17 +56,58 @@ Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Per
53
  </div>
54
 
55
  ## 🗞️ News
56
-
57
- - **`2025-04-12`**: ⭐️ We released our [Models](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
 
58
  - **`2025-04-04`**: 🤗 We released our [datasets](https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset/) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
59
  - **`2025-04-02`**: 🔥 We released codes and scripts for training/evaluation on [General Visual Reasoning Tasks](#GeneralVisualTasks).
60
  - **`2025-03-29`**: 🌍 We released the [repository](https://github.com/tanhuajie/Reason-RFT/) and [roadmap](#RoadMap) for **Reason-RFT**.
61
  - **`2025-03-26`**: 📑 We released our initial [ArXiv paper](https://arxiv.org/abs/2503.20752/) of **Reason-RFT**.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
- ## ⭐️ Usage
65
 
66
- *Please refer to [Reason-RFT](https://github.com/tanhuajie/Reason-RFT) for more details.*
 
 
 
 
 
67
 
68
  ## 📑 Citation
69
  If you find this project useful, welcome to cite us.
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-2B-Instruct
 
4
  datasets:
5
  - tanhuajie2001/Reason-RFT-CoT-Dataset
6
+ language:
7
+ - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
11
+ pipeline_tag: image-text-to-text
12
+ library_name: transformers
13
  ---
14
 
15
  <div align="center">
 
19
  # 🤗 Reason-RFT CoT Dateset
20
  *The model checkpoints in our project "Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning"*.
21
 
22
+ This model is described in the paper [Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models](https://huggingface.co/papers/2503.20752).
23
 
24
  <p align="center">
25
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
 
35
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
36
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
37
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
38
+ | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
39
+ | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
40
 
41
 
42
  ## 🔥 Overview
 
46
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
47
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
48
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
49
+ Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
50
+ **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
51
+ **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
52
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
53
 
54
  <div align="center">
 
56
  </div>
57
 
58
  ## 🗞️ News
59
+ - **`2025-09-18`**: 🔥🔥🔥 **Reason-RFT** gets accepted to NeurIPS 2025! See you in Mexico City and San Diego, USA!
60
+ - **`2025-06-06`**: 🤖 We're excited to announce the release of our more powerful [RoboBrain 2.0](https://github.com/FlagOpen/RoboBrain2.0) using Reason-RFT.
61
+ - **`2025-04-13`**: ✨ We released our [model zoo](https://github.com/tanhuajie/Reason-RFT?tab=readme-ov-file#--model-zoo) to huggingface.
62
  - **`2025-04-04`**: 🤗 We released our [datasets](https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset/) to huggingface for [General Visual Reasoning Tasks](#GeneralVisualTasks).
63
  - **`2025-04-02`**: 🔥 We released codes and scripts for training/evaluation on [General Visual Reasoning Tasks](#GeneralVisualTasks).
64
  - **`2025-03-29`**: 🌍 We released the [repository](https://github.com/tanhuajie/Reason-RFT/) and [roadmap](#RoadMap) for **Reason-RFT**.
65
  - **`2025-03-26`**: 📑 We released our initial [ArXiv paper](https://arxiv.org/abs/2503.20752/) of **Reason-RFT**.
66
 
67
+ ## ⭐️ Sample Usage
68
+
69
+ To get started with Reason-RFT, please follow these steps for setting up the environment and training:
70
+
71
+ ### 🛠️ Setup
72
+
73
+ ```bash
74
+ # clone repo.
75
+ git clone https://github.com/tanhuajie/Reason-RFT.git
76
+ cd Reason-RFT
77
+
78
+ # build conda env. for stage_rl
79
+ conda create -n reasonrft_rl python=3.10
80
+ conda activate reasonrft_rl
81
+ pip install -r requirements_rl.txt
82
+
83
+ # build conda env. for stage_sft
84
+ conda create -n reasonrft_sft python=3.10
85
+ conda activate reasonrft_sft
86
+ pip install -r requirements_sft.txt
87
+ ```
88
+
89
+ ### ♣️ Dataset Preparation
90
+
91
+ ```bash
92
+ # SFT Training:
93
+ change dataset paths defined in './train/stage_sft/dataset_info.json' file.
94
+
95
+ # RL Training:
96
+ change dataset paths defined in './scripts/train/reason_rft/stage_rl/xxx.bash' file.
97
+ change dataset paths defined in './scripts/train/reason_rft_zero/xxx.bash' file.
98
+
99
+ # Evaluation:
100
+ change dataset paths defined in './eval/eval_by_vllm_for_open_source.py' file.
101
+ ```
102
 
103
+ ### 📚 Training Example
104
 
105
+ ```bash
106
+ # Reason-RFT, Task1 (Visual-Counting), Qwen2-vl-2b, STAGE1 + STAGE2
107
+ bash scripts/train/reason_rft/stage_sft/resume_finetune_qwen2vl_2b_task1_stage1_sft.sh
108
+ bash scripts/train/reason_rft/stage_rl/resume_finetune_qwen2vl_2b_task1_stage2_rl.sh
109
+ ```
110
+ **Note:** Please change the dataset, pre-trained model and image path in the scripts above.
111
 
112
  ## 📑 Citation
113
  If you find this project useful, welcome to cite us.