Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- OpenDataArena/MMFineReason-1.8M
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
pipeline_tag: visual-question-answering
|
| 8 |
+
base_model:
|
| 9 |
+
- Qwen/Qwen3-VL-4B-Instruct
|
| 10 |
+
---
|
| 11 |
+
<div align="center">
|
| 12 |
+
<h1>MMFineReason</h1>
|
| 13 |
+
<p><strong>Closing the Multimodal Reasoning Gap via Open Data-Centric Methods</strong></p>
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
<div align="center">
|
| 17 |
+
|
| 18 |
+
[](https://arxiv.org/abs/2601.21821)
|
| 19 |
+
[](https://mmfinereason.github.io/)
|
| 20 |
+
[](https://huggingface.co/collections/OpenDataArena/mmfinereason)
|
| 21 |
+
|
| 22 |
+
</div>
|
| 23 |
+
|
| 24 |
+
<figure align="center">
|
| 25 |
+
<img src="https://raw.githubusercontent.com/mmfinereason/mmfinereason.github.io/main/static/images/model_compare.png" width="100%" alt="Model Performance Comparison">
|
| 26 |
+
<figcaption><em>Average score across mathematical reasoning and multimodal understanding benchmarks.</em></figcaption>
|
| 27 |
+
</figure>
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
This repository provides **MMFineReason-4B**; detailed dataset information is available at https://huggingface.co/datasets/OpenDataArena/MMFineReason-1.8M.
|
| 31 |
+
|
| 32 |
+
## π Overview
|
| 33 |
+
|
| 34 |
+
**MMFineReason** is a large-scale, high-quality multimodal reasoning dataset comprising **1.8M samples** and **5.1B solution tokens**, featuring detailed reasoning annotations distilled from **Qwen3-VL-235B-A22B-Thinking**.
|
| 35 |
+
|
| 36 |
+
### π― Key Highlights
|
| 37 |
+
|
| 38 |
+
- **1.8M High-Quality Samples** with **5.1B Solution Tokens**
|
| 39 |
+
- **Long-Form CoT**: Average reasoning length of **2,910 tokens** (2.7Γ HoneyBee, 4.3Γ OpenMMReasoner)
|
| 40 |
+
- **100% Caption Coverage**: Dense visual descriptions averaging 609 tokens
|
| 41 |
+
- **Multi-Domain**: Mathematics (79.4%), Science (13.8%), Puzzle/Game (4.6%), General/OCR (2.2%)
|
| 42 |
+
- **State-of-the-Art**: Models trained on this dataset achieve SOTA performance in their size class
|
| 43 |
+
|
| 44 |
+
## π§ Model Training
|
| 45 |
+
Based on the MMFineReason dataset, we train a family of multimodal reasoning models at 2B / 4B / 8B scales, all initialized from the corresponding Qwen3-VL-Instruct backbones and fine-tuned using a unified data-centric training recipe.
|
| 46 |
+
|
| 47 |
+
Each MMFineReason model is trained in two stages:
|
| 48 |
+
|
| 49 |
+
- **Supervised Fine-Tuning (SFT)** on MMFineReason-1.8M-SFT, leveraging long-form, visually grounded Chain-of-Thought (CoT) annotations with an average length of 2,910 tokens.
|
| 50 |
+
|
| 51 |
+
- **Reinforcement Learning (RL)** using GSPO, applied on MMFineReason-1.8M-RL to further improve reasoning reliability and generalization.
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
## π Model Performance
|
| 55 |
+
|
| 56 |
+
### Main Results
|
| 57 |
+
|
| 58 |
+
<figure align="center">
|
| 59 |
+
<img src="https://raw.githubusercontent.com/mmfinereason/mmfinereason.github.io/main/static/images/table_main_results.png" width="100%" alt="Main Benchmark Results">
|
| 60 |
+
<figcaption><em>Comparison of MMFineReason models with state-of-the-art models.</em></figcaption>
|
| 61 |
+
</figure>
|
| 62 |
+
|
| 63 |
+
MMFineReason-4B surpasses Qwen3-VL-8B-Thinking (73.9 vs 72.5), while MMFineReason-8B outperforms the larger Qwen3-VL-30B-A3B-Thinking (75.7 vs 74.5) and exceeds Gemini-2.5-Flash. On mathematical benchmarks, MFR-8B achieves 83.4% on DynaMath (vs Qwen3-VL-32B-Thinking's 82.0%) and 67.1% on MathVision, outperforming HoneyBee-8B and OMR-7B by 23-30 points. Despite minimal chart training data, MFR-8B generalizes well to CharXiv (90.8%) and RealWorldQA (75.6%).
|
| 64 |
+
|
| 65 |
+
### SFT vs RL Training Analysis
|
| 66 |
+
|
| 67 |
+
<figure align="center">
|
| 68 |
+
<img src="https://raw.githubusercontent.com/mmfinereason/mmfinereason.github.io/main/static/images/table_sft_rl_results.png" width="100%" alt="SFT vs RL Results">
|
| 69 |
+
<figcaption><em>Results comparing MFR-SFT and MFR-Thinking models against base Qwen3-VL variants.</em></figcaption>
|
| 70 |
+
</figure>
|
| 71 |
+
|
| 72 |
+
SFT drives major gains in mathematical reasoning (e.g., MathVision: 53.9% β 67.6% for 8B). RL enhances generalization on understanding benchmarks (e.g., AI2D: 78.5% β 82.5% for 2B) while showing variance on math benchmarks.
|
| 73 |
+
|
| 74 |
+
## π Model Zoo
|
| 75 |
+
|
| 76 |
+
| Model | Parameters | Avg Score | HuggingFace |
|
| 77 |
+
|-------|------------|-----------|-------------|
|
| 78 |
+
| MMFineReason-2B | 2B | 65.3 | [π€ Link](https://huggingface.co/OpenDataArena/MMFineReason-2B) |
|
| 79 |
+
| MMFineReason-4B | 4B | 73.9 | [π€ Link](https://huggingface.co/OpenDataArena/MMFineReason-4B) |
|
| 80 |
+
| MMFineReason-8B | 8B | 75.7 | [π€ Link](https://huggingface.co/OpenDataArena/MMFineReason-8B) |
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## π Citation
|
| 85 |
+
|
| 86 |
+
```bibtex
|
| 87 |
+
@article{lin2026mmfinereason,
|
| 88 |
+
title={MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods},
|
| 89 |
+
author={Lin, Honglin and Liu, Zheng and Zhu, Yun and Qin, Chonghan and Lin, Juekai and Shang, Xiaoran and He, Conghui and Zhang, Wentao and Wu, Lijun},
|
| 90 |
+
journal={arXiv preprint arXiv:2601.21821},
|
| 91 |
+
year={2026},
|
| 92 |
+
url={https://mmfinereason.github.io/}
|
| 93 |
+
}
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## π License
|
| 99 |
+
|
| 100 |
+
This dataset is released under the [Apache 2.0 License](https://opensource.org/licenses/Apache-2.0). Individual source datasets may have their own licenses.
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## π€ Acknowledgments
|
| 105 |
+
|
| 106 |
+
We thank the creators of FineVision, MMR1, BMMR, Euclid30K, GameQA-140K, LLaVA-CoT, WeMath, ViRL39K, and others. We also thank the Qwen team for the powerful Qwen3-VL series models.
|