Many-Shot CoT-ICL: Making In-Context Learning Truly Learn Paper • 2605.13511 • Published 3 days ago • 28
MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models Paper • 2508.06009 • Published Aug 8, 2025 • 16
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving Paper • 2508.09889 • Published Aug 13, 2025 • 32
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2, 2025 • 240
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts Paper • 2508.09848 • Published Aug 13, 2025 • 71
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published Aug 14, 2025 • 146