MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents Paper • 2603.09827 • Published 3 days ago • 26
Helios: Real Real-Time Long Video Generation Model Paper • 2603.04379 • Published 9 days ago • 161
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 10 days ago • 88
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published 14 days ago • 39
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models Paper • 2602.17602 • Published 22 days ago • 55
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 35
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published Jan 30 • 39
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 54
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published Jan 22 • 14
Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals Paper • 2601.05848 • Published Jan 9 • 16
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published Jan 2 • 57
PAI-Bench: A Comprehensive Benchmark For Physical AI Paper • 2512.01989 • Published Dec 1, 2025 • 6
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published Dec 2, 2025 • 25
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published Nov 9, 2025 • 55
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published Nov 11, 2025 • 42
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17, 2025 • 49
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10, 2025 • 50
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published Oct 8, 2025 • 48