Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published Nov 17 • 34
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4 • 58
RewardBench: Evaluating Reward Models for Language Modeling Paper • 2403.13787 • Published Mar 20, 2024 • 22