MuLan: A Joint Embedding of Music Audio and Natural Language Paper • 2208.12415 • Published Aug 26, 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models Paper • 2205.01917 • Published May 4, 2022 • 3
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 2 days ago • 21