SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 4 days ago • 55
Robust Multimodal Large Language Models Against Modality Conflict Paper • 2507.07151 • Published Jul 9, 2025 • 6
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21, 2025 • 78