World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 11 days ago • 116
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published 14 days ago • 16
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 23 days ago • 118
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 28 days ago • 49
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation Paper • 2604.10030 • Published 27 days ago • 15
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator Paper • 2604.08121 • Published 29 days ago • 43
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 25 days ago • 71
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 27 days ago • 79
VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization Paper • 2604.12887 • Published 24 days ago • 5
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 50
Generative Neural Video Compression via Video Diffusion Prior Paper • 2512.05016 • Published Dec 4, 2025 • 10
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published Dec 17, 2025 • 20
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation Paper • 2512.07831 • Published Dec 8, 2025 • 17
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 7
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published Dec 4, 2025 • 20
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 178
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 47
Plan-X: Instruct Video Generation via Semantic Planning Paper • 2511.17986 • Published Nov 22, 2025 • 18
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24, 2025 • 43