From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper
β’ 2602.22859 β’ Published
β’ 148
Qwen3_VL_8B_Instruct_DPE_v2 is the second-iteration model evolved through the DPE framework. It builds upon the improvements of v1 to further refine multimodal reasoning.
DPE mimics educational psychology by diagnosing "blind spots" and performing targeted corrections. This version represents the second full cycle of the evolutionary pipeline.
v2 Key Features:
| Category | Benchmark | Base Model | DPE_v2 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 65.44 | 69.11 | +3.67 |
| MMStar | 61.27 | 71.67 | +10.40 | |
| Visual Math | MathVision | 51.97 | 55.03 | +3.06 |
| MathVista | 76.20 | 78.00 | +1.80 | |
| Overall | Average | 65.64 | 67.72 | +2.08 |
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859},
}
This model follows the Qwen Research License.