Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty Paper • 2602.18312 • Published 5 days ago • 1
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published 8 days ago • 6
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published 5 days ago • 24
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 16 days ago • 201
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 14 days ago • 183