MJ1

MJ1: Multimodal Judgment via Grounded Verification

Haize Labs 2026

Main Results on MMRB2

MJ1 achieves state-of-the-art with only 3B active parameters, surpassing all API-based and open-source models. Best results in bold, second-best in italics.

Judge	T2I	Editing	Interleaved	Reasoning	Avg.

Open-source multimodal LLMs
Gemma 3 4B	51.7	51.0	51.3	48.8	50.7
Gemma 3 12B	56.0	58.0	58.0	49.3	55.3
Gemma 3 27B	58.3	60.2	61.1	49.4	57.3
Qwen2.5-VL-7B	50.4	57.1	48.4	47.5	50.9
Qwen2.5-VL-72B	59.1	64.6	62.3	50.0	59.0
Qwen3-VL-8B	59.4	61.7	61.5	54.6	59.3
Qwen3-VL-32B	64.1	67.3	70.5	56.6	64.6
Qwen3-VL-30B-A3B	60.0	59.5	57.3	57.3	58.5
Qwen3-VL-235B-A22B	62.0	64.8	69.0	55.9	62.9

API-based Models
GPT-4o	60.3	65.0	61.5	51.9	59.7
GPT-4.1	65.8	68.2	67.0	53.0	63.5
GPT-5	70.5	73.8	74.4	70.2	72.2
Gemini 2.5 Flash	63.1	66.5	69.4	57.5	64.1
Gemini 2.5 Pro	70.5	71.3	75.1	66.6	70.9
Gemini 3 Pro	74.4	74.9	76.4	79.5	76.3

MJ1 (Qwen3-VL-30B-A3B + LoRA)	80.2	78.1	73.5	76.4	77.0

Citation

@misc{kumar2026mj1multimodaljudgmentgrounded,
      title={MJ1: Multimodal Judgment via Grounded Verification}, 
      author={Bhavesh Kumar and Dylan Feng and Leonard Tang},
      year={2026},
      eprint={2603.07990},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.07990}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for haizelabs/mj1

MJ1: Multimodal Judgment via Grounded Verification

Paper • 2603.07990 • Published Mar 9 • 1