The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 6 days ago • 131
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 36
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published Feb 5 • 52
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published Feb 2 • 46
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Paper • 2601.14133 • Published Jan 20 • 61
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models Paper • 2601.11404 • Published Jan 16 • 26
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published Jan 14 • 33
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 54
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 54
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 54
3AM: Segment Anything with Geometric Consistency in Videos Paper • 2601.08831 • Published Jan 13 • 34
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 230
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published Jan 4 • 46