Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 12 items • Updated 18 days ago • 51
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated 6 days ago • 55
Canary ASR/AST Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 6 items • Updated 18 days ago • 34
Parakeet ASR Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 18 days ago • 70
view article Article How I contributed a new model to the Transformers library using Codex Mar 30 • 51
view article Article Raw Robot Video to VLA-Ready Training Data: Annotating LeRobot Datasets with Nomadic and HuggingFace Buckets Mar 21 • 17
ALARM Collection Official checkpoints and data for "ALARM: Audio–Language Alignment for Reasoning Models" • 8 items • Updated Mar 9 • 1
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 19
view article Article Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines +2 Mar 5 • 51