Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published 8 days ago • 120
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published 4 days ago • 26
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published 5 days ago • 26
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 23 days ago • 49
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 23 days ago • 49
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 23 days ago • 49
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 21 days ago • 47
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation Paper • 2601.10061 • Published 27 days ago • 30
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published Jan 9 • 23
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published about 1 month ago • 211
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published Jan 2 • 56
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time Paper • 2512.25075 • Published Dec 31, 2025 • 15
Pretraining Frame Preservation in Autoregressive Video Memory Compression Paper • 2512.23851 • Published Dec 29, 2025 • 25
SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency Paper • 2510.22994 • Published Oct 27, 2025
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published Dec 18, 2025 • 38
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published Dec 18, 2025 • 38