Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published 4 days ago • 25
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 23
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 26
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Paper • 2510.13795 • Published Oct 15, 2025 • 60
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published Feb 24, 2025 • 52