ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published Dec 26, 2025 • 18
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published Dec 15, 2025 • 74
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published Dec 9, 2025 • 18
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published Dec 5, 2025 • 38
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published Dec 2, 2025 • 37
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 33
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21, 2025 • 22
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Paper • 2510.13759 • Published Oct 15, 2025 • 11
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Paper • 2508.03694 • Published Aug 5, 2025 • 52
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published Mar 27, 2025 • 33