ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling Paper • 2604.15086 • Published Apr 16 • 2
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization Paper • 2509.19999 • Published Sep 24, 2025