EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11, 2025 • 23
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Paper • 2506.08967 • Published Jun 10, 2025 • 2
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 jsulz, yuchenglow, znation, saba9 • Feb 12, 2025 • 81
Cosmos Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/nvidia-cosmos-2 • 14 items • Updated 8 days ago • 303
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 4 items • Updated Jul 31, 2025 • 32
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k