Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 7 days ago • 13
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published 14 days ago • 18
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published 15 days ago • 39
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 15 days ago • 26
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Paper • 2601.10305 • Published 15 days ago • 36
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 21 days ago • 211
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 26 days ago • 44
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 23 days ago • 141
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published 29 days ago • 130
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation Paper • 2512.24551 • Published about 1 month ago • 19
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation Paper • 2512.21252 • Published Dec 24, 2025 • 35
StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published Dec 22, 2025 • 18
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry Paper • 2512.19629 • Published Dec 22, 2025 • 26
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published Dec 12, 2025 • 22