Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis Paper • 2601.14417 • Published 3 days ago • 5
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published 8 days ago • 35
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published 17 days ago • 46
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 12 days ago • 206
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 2 days ago • 40
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3, 2025 • 10
tiantiaf/whisper-large-v3-msp-podcast-emotion Audio Classification • 2B • Updated Aug 10, 2025 • 3.07k • 5