When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning Paper • 2602.08236 • Published Feb 9 • 9
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories Paper • 2602.14941 • Published Feb 16 • 6
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding Paper • 2605.09874 • Published May 11 • 2
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation Paper • 2605.14269 • Published May 14 • 9
VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction Paper • 2605.15186 • Published May 14 • 26
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models Paper • 2605.12227 • Published May 12 • 1
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos Paper • 2603.22529 • Published Mar 23 • 7
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting Paper • 2603.14659 • Published Mar 15 • 6
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Paper • 2506.06275 • Published Jun 6, 2025
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution Paper • 2602.16154 • Published Feb 18 • 1
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning Paper • 2602.08236 • Published Feb 9 • 9
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Paper • 2507.06485 • Published Jul 9, 2025 • 5
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Paper • 2506.06275 • Published Jun 6, 2025