The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 1 day ago • 180
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published 2 days ago • 143
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Paper • 2602.22766 • Published 2 days ago • 34
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published 2 days ago • 27
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Paper • 2602.23258 • Published 1 day ago • 24
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published 2 days ago • 17
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games Paper • 2602.17594 • Published 9 days ago • 8
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 19 days ago • 211
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 17 days ago • 185
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 23 days ago • 341
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published 27 days ago • 305
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 19 days ago • 272
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 17 days ago • 234
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 228