AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 2 days ago • 10
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 1 day ago • 5
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 3 days ago • 26
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 4 days ago • 111
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 10 days ago • 104
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 8 days ago • 41
Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass Paper • 2603.12789 • Published 15 days ago • 3
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published 11 days ago • 8
Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching Paper • 2603.15016 • Published 12 days ago • 10
WildActor: Unconstrained Identity-Preserving Video Generation Paper • 2603.00586 • Published 28 days ago • 37
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published 28 days ago • 58
EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents Paper • 2602.23205 • Published 29 days ago • 11
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published Feb 20 • 30
UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper • 2602.14178 • Published Feb 15 • 14
SemanticMoments: Training-Free Motion Similarity via Third Moment Features Paper • 2602.09146 • Published Feb 9 • 21
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 52
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models Paper • 2602.07106 • Published Feb 6 • 11