Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 2 days ago • 92
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published 7 days ago • 13
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 6 days ago • 64
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Paper • 2603.16871 • Published 8 days ago • 58
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 9 days ago • 145
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 13 days ago • 31
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published 13 days ago • 34
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation Paper • 2603.06014 • Published 19 days ago • 9
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 19 days ago • 115
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published Feb 20 • 30
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published Feb 9 • 28
Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers Paper • 2602.03510 • Published Feb 3 • 27
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 64
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published Jan 20 • 48
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published Jan 15 • 18