Improve model card: add pipeline tag, library name, and descriptive title

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -1,22 +1,24 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  datasets:
6
  - tanhuajie2001/Reason-RFT-CoT-Dataset
 
 
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - Qwen/Qwen2-VL-2B-Instruct
11
  ---
12
 
13
  <div align="center">
14
  <img src="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/logo.png" width="500"/>
15
  </div>
16
 
17
- # 🤗 Reason-RFT CoT Dateset
18
- *The model checkpoints in our project "Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning"*.
19
 
 
20
 
21
  <p align="center">
22
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
@@ -32,8 +34,8 @@ base_model:
32
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
33
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
34
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
35
- | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
36
- | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
37
 
38
 
39
  ## 🔥 Overview
@@ -43,9 +45,9 @@ However, this training paradigm may lead to overfitting and cognitive rigidity,
43
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
44
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
45
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
46
- Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
47
- **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
48
- **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
49
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
50
 
51
  <div align="center">
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-2B-Instruct
 
4
  datasets:
5
  - tanhuajie2001/Reason-RFT-CoT-Dataset
6
+ language:
7
+ - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
11
+ pipeline_tag: image-text-to-text
12
+ library_name: transformers
13
  ---
14
 
15
  <div align="center">
16
  <img src="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/logo.png" width="500"/>
17
  </div>
18
 
19
+ # Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
 
20
 
21
+ This repository contains the model checkpoints for **Reason-RFT**, a model presented in the paper [Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models](https://huggingface.co/papers/2503.20752).
22
 
23
  <p align="center">
24
  </a>&nbsp&nbsp⭐️ <a href="https://tanhuajie.github.io/ReasonRFT/">Project</a></a>&nbsp&nbsp │ &nbsp&nbsp🌎 <a href="https://github.com/tanhuajie/Reason-RFT">Github</a>&nbsp&nbsp │ &nbsp&nbsp🔥 <a href="https://huggingface.co/datasets/tanhuajie2001/Reason-RFT-CoT-Dataset">Dataset</a>&nbsp&nbsp │ &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2503.20752">ArXiv</a>&nbsp&nbsp │ &nbsp&nbsp💬 <a href="https://github.com/tanhuajie/Reason-RFT/raw/main/assets/wechat.png">WeChat</a>
 
34
  |------------------------|---------------------------|---------------------|---------------------------|---------------------------|
35
  | Visual Counting | [🤗VC-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Visual-Counting-Qwen2-VL-7B) | [🤗VC-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-2B) | [🤗VC-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Visual-Counting-Qwen2-VL-7B) |
36
  | Structure Perception | [🤗SP-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Structure-Perception-Qwen2-VL-7B) | [🤗SP-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-2B) | [🤗SP-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Structure-Perception-Qwen2-VL-7B) |
37
+ | Spatial Transformation | [🤗ST-GRPO-Zero-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-Zero-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Zero-Spatial-Transformation-Qwen2-VL-7B) | [🤗ST-GRPO-2B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-2B) | [🤗ST-GRPO-7B](https://huggingface.co/tanhuajie2001/Reason-RFT-Spatial-Transformation-Qwen2-VL-7B) |
38
+ | ***Embodied Tasks*** | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* | 🤖 *Stay Turned* |
39
 
40
 
41
  ## 🔥 Overview
 
45
  To address these limitations, we propose **Reason-RFT**, a novel reinforcement fine-tuning framework that significantly enhances generalization capabilities in visual reasoning tasks.
46
  **Reason-RFT** introduces a two-phase training framework for visual reasoning: (1) Supervised Fine-Tuning (SFT) with curated Chain-of-Thought (CoT) data activates the reasoning potential of Vision-Language Models (VLMs), followed by (2) Group Relative Policy Optimization (GRPO)-based reinforcement learning that generates multiple reasoning-response pairs, significantly enhancing generalization in visual reasoning tasks.
47
  To evaluate **Reason-RFT**'s visual reasoning capabilities, we reconstructed a comprehensive dataset spanning visual counting, structure perception, and spatial transformation, serving as a benchmark to systematically assess visual cognition, geometric understanding, and spatial generalization.
48
+ Experimental results demonstrate Reasoning-RFT's three key advantages: **(1) Performance Enhancement**: achieving state-of-the-art results across multiple tasks, outperforming most mainstream open-source and proprietary models;
49
+ **(2) Generalization Superiority**: consistently maintaining robust performance across diverse tasks and domains, outperforming alternative training paradigms;
50
+ **(3) Data Efficiency**: excelling in few-shot learning scenarios while surpassing full-dataset SFT baselines;
51
  **Reason-RFT** introduces a novel paradigm in visual reasoning, significantly advancing multimodal research.
52
 
53
  <div align="center">