Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Paper • 2606.22565 • Published 5 days ago • 7
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Paper • 2606.12191 • Published 16 days ago • 67
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models Paper • 2603.11896 • Published Mar 12 • 10
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning Paper • 2603.02024 • Published Mar 2 • 47
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 356
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 159
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models Paper • 2511.11134 • Published Nov 14, 2025 • 33
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published Jun 4, 2025 • 31
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis Paper • 2506.04142 • Published Jun 4, 2025 • 28