Add pipeline tag, library name and paper information
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,6 +1,209 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
|
| 7 |
+
<div align="center">
|
| 8 |
+
<img alt="MM-Eureka logo" src="./docs/logo.png" style="height: 200px;" />
|
| 9 |
+
</div>
|
| 10 |
+
|
| 11 |
+
<div align="center">
|
| 12 |
+
|
| 13 |
+
# MM-EUREKA
|
| 14 |
+
|
| 15 |
+
</div>
|
| 16 |
+
|
| 17 |
+
<div align="center">
|
| 18 |
+
<p align="center">
|
| 19 |
+
π<a href="https://github.com/ModalMinds/MM-EUREKA/blob/qwen/MM_EUREKA_Tech_Report.pdf">Report</a> |
|
| 20 |
+
π<a href="https://huggingface.co/datasets/FanqingM/MMK12">MMK12 Datasets & Benchmark</a> |
|
| 21 |
+
π€<a href="https://huggingface.co/FanqingM/MM-Eureka-Qwen-7B">MM-Eureka-Qwen-7B</a> |
|
| 22 |
+
π€<a href="https://huggingface.co/FanqingM/MM-Eureka-Qwen-32B">MM-Eureka-Qwen-32B</a>
|
| 23 |
+
</p>
|
| 24 |
+
</div>
|
| 25 |
+
|
| 26 |
+
<hr>
|
| 27 |
+
<div align="center">
|
| 28 |
+
<p style="text-align: center;">MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning<p>
|
| 29 |
+
</div>
|
| 30 |
+
<hr>
|
| 31 |
+
<div align="center">
|
| 32 |
+
<a href="https://github.com/ModalMinds/MM-EUREKA/blob/qwen/MM_EUREKA_Tech_Report.pdf">[MM-Eureka Report Link]</a>, <a href="https://arxiv.org/abs/2503.07365">[MM-Eureka arxiv Link]</a>
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
<hr>
|
| 36 |
+
<div align="center">
|
| 37 |
+
<p style="text-align: center;">CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models<p>
|
| 38 |
+
</div>
|
| 39 |
+
<hr>
|
| 40 |
+
<div align="center">
|
| 41 |
+
<a href="https://github.com/ModalMinds/MM-EUREKA/blob/qwen/CPGD_Tech_Report.pdf">[CPGD Report Link]</a>, <a href="https://arxiv.org/abs/2505.12504">[CPGD arxiv Link]</a>
|
| 42 |
+
</div>
|
| 43 |
+
|
| 44 |
+
## π―Overview
|
| 45 |
+
|
| 46 |
+
We present **MM-Eureka-Qwen-7B** and **MM-Eureka-Qwen-32B**, both are powerful multimodal reasoning models that successfully extend large-scale rule-based reinforcement learning (RL) to multimodal reasoning. Compared to the previous version of MM-EUREKA based on InternVL, we have made improvements in model architecture, algorithms, and data. For instance, MM-Eureka-Qwen-7B achieves **66.1** on MMK12 evaluation sets, only 0.2 points below InternVL-2.5-78B. On MathVista(testmini), it reaches **73.0**, even surpassing InternVLVL-2.5-78B. MM-Eureka-Qwen-32B demonstrates stronger performance, scoring **72.3** on MMK12 evaluation sets, which exceeds both Qwen2.5-VL-72B's **70.3** and closed-source models like Gemini2-Flash, ranking second only to o1's **73.9**. On commonly used multimodal mathematical reasoning benchmarks, MM-Eureka-Qwen-32B achieves **73.4** on WeMath, outperforming all open-source models and most closed-source models including Claude3.7 Sonnet. On MathVista, it reaches **74.8**, surpassing all open-source and closed-source models. Both variants demonstrate significant improvements in multidisciplinary K12 and mathematical reasoning performance, outperforming most open-source models of similar sizes.
|
| 47 |
+
|
| 48 |
+
**Core Improvements:**
|
| 49 |
+
|
| 50 |
+
1. We further iterate the codebase to support algorithms including Online Filter, [ADORA](https://github.com/ShadeCloak/ADORA?tab=readme-ov-file), and [DAPO](https://arxiv.org/abs/2503.14476).
|
| 51 |
+
2. We open-source self-collected MMK12, which has 15k diverse and high-quality samples and 2k MCQs for Math, Physics, Chemistry, Biology for evaluation.
|
| 52 |
+
3. We train the MM-Eureka-Qwen-7B and MM-Eureka-Qwen-32B, which are the almost top performer in multimodal reasoning within similar size open-source models. Especially for Multidisciplinary K12 tasks.
|
| 53 |
+
|
| 54 |
+
π₯We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at [MM-EUREKA-Qwen](https://github.com/ModalMinds/MM-EUREKA/tree/qwen).
|
| 55 |
+
|
| 56 |
+
## ποΈ News
|
| 57 |
+
|
| 58 |
+
- **[2025/05/19]** We proposed a novel RL algorithm called `Clipped Policy Gradient Optimization with Policy Drift (CPGD)`, which is based on policy gradient loss with a clipping mechanism and a policy drift regularizer. In our experiments, we found that it is more stable and performs better than GRPO.
|
| 59 |
+
- π Report: [CPGD-Report](https://github.com/ModalMinds/MM-EUREKA/blob/qwen/CPGD_Tech_Report.pdf), [CPGD-arxiv](https://arxiv.org/abs/2505.12504)
|
| 60 |
+
- π€ Model: [MM-Eureka-CPGD-Qwen-7B](https://huggingface.co/Zkkkai/CPGD-7B)
|
| 61 |
+
- πCode: [MM-Eureka-Qwen-Code](https://github.com/ModalMinds/MM-EUREKA/tree/qwen)
|
| 62 |
+
|
| 63 |
+
- **[2025/04/15]** We released `MM-Eureka-Qwen-7B` , `MM-Eureka-Qwen-32B` and `MMK12`.
|
| 64 |
+
- π Report: [MM-Eureka-Qwen-Report](https://github.com/ModalMinds/MM-EUREKA/blob/qwen/MM_EUREKA_Tech_Report.pdf), [MM-Eureka-Qwen-arxiv](https://arxiv.org/abs/2503.07365)
|
| 65 |
+
- π€ Model: [MM-Eureka-Qwen-7B](https://huggingface.co/FanqingM/MM-Eureka-Qwen-7B)
|
| 66 |
+
- π€ Model: [MM-Eureka-Qwen-32B](https://huggingface.co/FanqingM/MM-Eureka-Qwen-32B)
|
| 67 |
+
- π Dataset: [MMK12](https://huggingface.co/datasets/FanqingM/MMK12)
|
| 68 |
+
- πCode: [MM-Eureka-Qwen-Code](https://github.com/ModalMinds/MM-EUREKA/tree/qwen)
|
| 69 |
+
|
| 70 |
+
- **[2025/03/27]** We released `MM-Eureka-Qwen`.
|
| 71 |
+
- π Report: [MM-Eureka-Qwen-Report](https://jagged-court-d9d.notion.site/MM-Eureka-Qwen-1c13cc5a384880ffbd2de24e1dee052d)
|
| 72 |
+
- π€ Model: [MM-Eureka-Qwen-7B](https://huggingface.co/FanqingM/MM-Eureka-Qwen-7B)
|
| 73 |
+
- π Dataset: [MM-Eureka-Dataset](https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset)
|
| 74 |
+
- πCode: [MM-Eureka-Qwen-Code](https://github.com/ModalMinds/MM-EUREKA/tree/qwen)
|
| 75 |
+
|
| 76 |
+
- **[2025/03/07]** We released `MM-Eureka`.
|
| 77 |
+
- π Paper: [MM-Eureka-paper](https://github.com/ModalMinds/MM-EUREKA/blob/main/MM_Eureka_paper.pdf)
|
| 78 |
+
- π€ Model: [MM-Eureka-8B](https://huggingface.co/FanqingM/MM-Eureka-8B) & [MM-Eureka-Zero-38B](https://huggingface.co/FanqingM/MM-Eureka-Zero-38B)
|
| 79 |
+
- π Dataset: [MM-Eureka-Dataset](https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset)
|
| 80 |
+
- πCode: [MM-Eureka-Code](https://github.com/ModalMinds/MM-EUREKA/tree/internvl)
|
| 81 |
+
|
| 82 |
+
## π Features
|
| 83 |
+
|
| 84 |
+
This repository is built upon [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), introducing several key enhancements:
|
| 85 |
+
|
| 86 |
+
- **Multimodal RFT Support**: Extends OpenRLHF to incorporate **vision-language models (VLMs)**, currently supporting **InternVL**, enabling multimodal reasoning capabilities.
|
| 87 |
+
- Currently support **RLOO**, **REINFORCE++**, **GRPO** training using Ray.
|
| 88 |
+
- vLLM integration and distributed training.
|
| 89 |
+
- Support hybrid engine (`--colocate_all_models`, `--vllm_enable_sleep`).
|
| 90 |
+
- **Better Rule-based Reward support**: Better training visualization for Rule-based Rewards (i.g. Format Reward, Accuracy Reward, Repetition Penalty)
|
| 91 |
+
- **Enhanced Online Filtering**: Filtering out experiences based on Accuracy Reward during training as in [PRIME](https://github.com/PRIME-RL/PRIME)
|
| 92 |
+
- Use `--enable_accuracy_filter`, `--freezing_filter_steps`, `--accuracy_lower_bound`, `--accuracy_upper_bound` to control the behavior of online accuracy filter.
|
| 93 |
+
- **ADORA**: Enable Adaptive Online Rollout Adjustment by using `--use_adora` and `--adora_lamda` as in [ADORA](https://five-stetson-b51.notion.site/Training-Reasoning-Model-with-Dynamic-Advantage-Estimation-on-Reinforcement-Learning-1a830cc0904681fa9df3e076b6557a3e).
|
| 94 |
+
- **DAPO**: You can use `--use_dapo` to enable DAPO loss during training as in [DAPO](https://arxiv.org/abs/2503.14476).
|
| 95 |
+
- **CPGD**: You can use `--use_cpg_loss` and `--use_policy_drift` to enable CPGD loss during training as in [CPGD](https://github.com/ModalMinds/MM-EUREKA/blob/qwen/CPGD_Tech_Report.pdf). Additionally:
|
| 96 |
+
- `--policy_drift_coef` controls the weight of the policy drift regularizer, and `--policy_drift_clip_eps` controls the clipping range in policy drift.
|
| 97 |
+
- `--use_clip_filter_like_weight` enables the clip-filter-like weight proposed in [CPGD](https://github.com/ModalMinds/MM-EUREKA/blob/qwen/CPGD_Tech_Report.pdf), and `--clip_filter_like_weight_clip_eps` controls the clipping range in the clip-filter-like weight.
|
| 98 |
+
- Example script is provided in `MM-EUREKA/examples/scripts/train_cpgd_qwen_7b_single_node.sh` or `MM-EUREKA/examples/scripts/train_cpgd_qwen_7b_multi_node.sh`.
|
| 99 |
+
|
| 100 |
+
## π€ Models
|
| 101 |
+
|
| 102 |
+
Based on the key factors identified by https://github.com/ModalMinds/MM-EUREKA for achieving stable training, we enhanced the model, dataset, and algorithmic modules. Specifically, we maintained the strategy of omitting the KL divergence term and applying data filtering, while implementing the following critical modifications:
|
| 103 |
+
|
| 104 |
+
- The base model was upgraded from InternVL2.5-8B-Instruct to the more powerful Qwen2.5-VL-7B-Instruct.
|
| 105 |
+
- The Vision Transformer (ViT) module was frozen during training.
|
| 106 |
+
- The underlying RL algorithm was replaced with [GRPO](https://arxiv.org/pdf/2402.03300), instead of the previously used RLOO.
|
| 107 |
+
- The data filtering strategy was transitioned from an offline approach to an online approach.
|
| 108 |
+
- Additional data from the K12 dataset was collected, expanding the total dataset size to 15,000 samples.
|
| 109 |
+
|
| 110 |
+
| Model | MathVista | MathVerse | MathVision | OlympiadBench | WeMath | MMK12 |
|
| 111 |
+
|------------------------|-----------|-----------|------------|---------------|--------|-------|
|
| 112 |
+
| Claude3.7-Sonnet | 66.8 | 52.0 | 41.3 | 48.9 | 72.6 | 55.3 |
|
| 113 |
+
| GPT-4o | 63.8 | 50.2 | 30.4 | 35.0 | 68.8 | 49.9 |
|
| 114 |
+
| o1 | 73.9 | 57.0 | 60.3 | 68.0 | 98.7 | 73.9 |
|
| 115 |
+
| Gemini2-flash | 70.4 | 59.3 | 41.3 | 51.0 | 71.4 | 65.2 |
|
| 116 |
+
| Qwen-2.5-VL-7B | 68.2 | 47.9 | 25.4 | 20.2 | 62.1 | 53.6 |
|
| 117 |
+
| Qwen-2.5-VL-32B | 74.7/71.7 | 49.9 | **40.1** | 30.0 | 69.1 | 66.8 |
|
| 118 |
+
| Qwen-2.5-VL-72B | **74.8** | **57.6** | 38.1 | **40.4** | 72.4 | 70.5 |
|
| 119 |
+
| InternVL2.5-VL-78B | 72.3 | 51.7 | 32.2 | 31.1 | 66.3 | 61.6 |
|
| 120 |
+
| QVQ-72B-Preview | 71.4 | 48.2 | 35.9 | 33.2 | 65.4 | 61.5 |
|
| 121 |
+
| Adora-7B | 73.5 | 50.1 | 23.0 | 20.1 | 64.2 | 58.1 |
|
| 122 |
+
| R1-Onevision-7B | 64.1 | 47.1 | 29.9/23.5 | 17.3 | 61.8 | 39.8 |
|
| 123 |
+
| **MM-Eureka-Qwen-7B** | 73.0 | 50.3 | 26.9 | 20.1 | 66.1 | 64.5 |
|
| 124 |
+
| **MM-Eureka-Qwen-32B** | **74.8** | 56.5 | 34.4 | 35.9 | **73.4** | **72.2** |
|
| 125 |
+
| **MM-Eureka-CPGD-Qwen-7B** | 74.0 | 50.6 | 28.3 | 21.4 | 68.3 | 65.3 |
|
| 126 |
+
|
| 127 |
+
- π€ [MM-Eureka-Qwen-7B](https://huggingface.co/FanqingM/MM-Eureka-Qwen-7B)
|
| 128 |
+
- π€ [MM-Eureka-Qwen-32B](https://huggingface.co/FanqingM/MM-Eureka-Qwen-32B)
|
| 129 |
+
- π€ [MM-Eureka-CPGD-Qwen-7B](https://huggingface.co/Zkkkai/CPGD-7B)
|
| 130 |
+
|
| 131 |
+
## π Getting Started
|
| 132 |
+
|
| 133 |
+
### π¦ Installation
|
| 134 |
+
|
| 135 |
+
```shell
|
| 136 |
+
git clone https://github.com/ModalMinds/MM-EUREKA.git
|
| 137 |
+
git checkout qwen
|
| 138 |
+
cd MM-EUREKA
|
| 139 |
+
pip install -e .[vllm]
|
| 140 |
+
pip install flash_attn --no-build-isolation
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
### π Data Preparation
|
| 144 |
+
|
| 145 |
+
You can download our training data from [MMK12](https://huggingface.co/datasets/FanqingM/MMK12)
|
| 146 |
+
|
| 147 |
+
Once downloaded, refer to the section below for additional data formation.
|
| 148 |
+
|
| 149 |
+
#### Custom dataset
|
| 150 |
+
|
| 151 |
+
For custom dataset, format your data in to a JSONL file, where each entry is a dictionary organized in the following format.
|
| 152 |
+
|
| 153 |
+
```json
|
| 154 |
+
{
|
| 155 |
+
"id": "0",
|
| 156 |
+
"message": "[{\"role\": \"user\", \"content\": [{\"type\": \"image\", \"image\": \"file:///path/to/your/image.jpg\"}, {\"type\": \"text\", \"text\": \"How many cats in the image?\"}]}]",
|
| 157 |
+
"answer": "gt that could be parsed and verified by math_verify"
|
| 158 |
+
}
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
### π Start Training
|
| 162 |
+
|
| 163 |
+
Before starting your own training, ensure that the paths in the provided training scripts are correctly set and that environment variables like `$MASTER_ADDR` and `$NODE_RANK` are properly configured.
|
| 164 |
+
|
| 165 |
+
**start MM-Eureka-Qwen-7B training**
|
| 166 |
+
|
| 167 |
+
- for single node
|
| 168 |
+
|
| 169 |
+
```shell
|
| 170 |
+
sh examples/scripts/train_mm_eureka_qwen_7b_single_node.sh
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
- for multiple node
|
| 174 |
+
|
| 175 |
+
```shell
|
| 176 |
+
sh examples/scripts/train_mm_eureka_qwen_7b_multi_node.sh
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
## β Starchart
|
| 180 |
+
|
| 181 |
+
[](https://star-history.com/#ModalMinds/MM-EUREKA&Date)
|
| 182 |
+
|
| 183 |
+
## π€ Contribution
|
| 184 |
+
|
| 185 |
+
MM-Eureka is still under active development, if you want to contribute, please feel free to make a pull request or create an issue.
|
| 186 |
+
|
| 187 |
+
Please refer to `CONTRIBUTING.md` before you dive inοΌ
|
| 188 |
+
|
| 189 |
+
## π¬ Contact
|
| 190 |
+
|
| 191 |
+
If you have any questions or would like to engage with our community, feel free to scan the QR code below to join our WeChat group.
|
| 192 |
+
|
| 193 |
+
<div align="center">
|
| 194 |
+
<img alt="MM-Eureka logo" src="docs/wechat_qr.png" style="height: 400px;" />
|
| 195 |
+
</div>
|
| 196 |
+
|
| 197 |
+
## π Acknowledgements
|
| 198 |
+
|
| 199 |
+
We acknowledge the outstanding open-source contributions from [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [LMM-R1](https://github.com/TideDra/lmm-r1) and [vLLM](https://github.com/vllm-project/vllm). We also extend our gratitude to [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1), [InternVL](https://github.com/OpenGVLab/InternVL) and [QwenVL](https://github.com/QwenLM/Qwen2.5-VL) for their open-source techniques and base models, which have enabled us to further our exploration.
|
| 200 |
+
|
| 201 |
+
## π Citation
|
| 202 |
+
```
|
| 203 |
+
@article{meng2025mm,
|
| 204 |
+
title={MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning},
|
| 205 |
+
author={Meng, Fanqing and Du, Lingxiao and Liu, Zongkai and Zhou, Zhixiang and Lu, Quanfeng and Fu, Daocheng and Shi, Botian and Wang, Wenhai and He, Junjun and Zhang, Kaipeng and others},
|
| 206 |
+
journal={arXiv preprint arXiv:2503.07365},
|
| 207 |
+
year={2025}
|
| 208 |
+
}
|
| 209 |
+
```
|