MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction Paper • 2606.18558 • Published 13 days ago • 53
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 7 days ago • 139
The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 6 days ago • 42
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models Paper • 2606.25041 • Published 7 days ago • 102
ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing? Paper • 2606.19531 • Published 13 days ago • 21
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning Paper • 2606.24133 • Published 7 days ago • 11
Autodata: An agentic data scientist to create high quality synthetic data Paper • 2606.25996 • Published 6 days ago • 15
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating Paper • 2606.21661 • Published 11 days ago • 24
WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction Paper • 2605.29341 • Published May 28 • 18
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 27 days ago • 126
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published May 15 • 6
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization Paper • 2605.20150 • Published May 19 • 7
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published May 18 • 116
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Paper • 2605.15128 • Published May 14 • 64
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published May 14 • 39