MJ1

MJ1: Multimodal Judgment via Grounded Verification

๐Ÿ“ฆ Data  |  ๐Ÿ“„ Paper

Haize Labs 2026


Main Results on MMRB2

MJ1 achieves state-of-the-art with only 3B active parameters, surpassing all API-based and open-source models. Best results in bold, second-best in italics.

Judge T2I Editing Interleaved Reasoning Avg.
Open-source multimodal LLMs
Gemma 3 4B 51.7 51.0 51.3 48.8 50.7
Gemma 3 12B 56.0 58.0 58.0 49.3 55.3
Gemma 3 27B 58.3 60.2 61.1 49.4 57.3
Qwen2.5-VL-7B 50.4 57.1 48.4 47.5 50.9
Qwen2.5-VL-72B 59.1 64.6 62.3 50.0 59.0
Qwen3-VL-8B 59.4 61.7 61.5 54.6 59.3
Qwen3-VL-32B 64.1 67.3 70.5 56.6 64.6
Qwen3-VL-30B-A3B 60.0 59.5 57.3 57.3 58.5
Qwen3-VL-235B-A22B 62.0 64.8 69.0 55.9 62.9
API-based Models
GPT-4o 60.3 65.0 61.5 51.9 59.7
GPT-4.1 65.8 68.2 67.0 53.0 63.5
GPT-5 70.5 73.8 74.4 70.2 72.2
Gemini 2.5 Flash 63.1 66.5 69.4 57.5 64.1
Gemini 2.5 Pro 70.5 71.3 75.1 66.6 70.9
Gemini 3 Pro 74.4 74.9 76.4 79.5 76.3
MJ1 (Qwen3-VL-30B-A3B + LoRA) 80.2 78.1 73.5 76.4 77.0

Citation

@misc{kumar2026mj1multimodaljudgmentgrounded,
      title={MJ1: Multimodal Judgment via Grounded Verification}, 
      author={Bhavesh Kumar and Dylan Feng and Leonard Tang},
      year={2026},
      eprint={2603.07990},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.07990}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for haizelabs/mj1