SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14, 2025 • 51
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5, 2025 • 53
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Paper • 2511.21579 • Published Nov 26, 2025 • 23
StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars Paper • 2512.22065 • Published Dec 26, 2025
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars Paper • 2602.01538 • Published 11 days ago • 15