RASMUS
/

Finnish-ASR-Canary-v2

Automatic Speech Recognition

speech-recognition

Eval Results (legacy)

Model card Files Files and versions

Finnish-ASR-Canary-v2 / NeMo /docs /source /index.rst

RASMUS's picture

Upload folder using huggingface_hub

cde7fe4 verified 4 months ago

history blame contribute delete

3.35 kB

	NVIDIA NeMo Toolkit Developer Docs
	===================================

	`NVIDIA NeMo Toolkit <https://github.com/NVIDIA/NeMo>`_ is an open-source toolkit for speech, audio, and multimodal language model research, with a clear path from experimentation to production deployment.

	Models
	------

	- ASR: `Parakeet <https://huggingface.co/collections/nvidia/parakeet>`_, `Canary <https://huggingface.co/collections/nvidia/canary>`_, FastConformer -- with CTC, Transducer, TDT, and hybrid decoders
	- TTS: `MagpieTTS <https://huggingface.co/nvidia/magpie_tts_multilingual_357m>`_, `FastPitch <https://huggingface.co/nvidia/tts_en_fastpitch>`_ + `HiFi-GAN <https://huggingface.co/nvidia/tts_hifigan>`_ -- multi-language, multi-speaker
	- Speaker: `Sortformer <https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1>`_ streaming diarization, `TitaNet <https://huggingface.co/nvidia/speakerverification_en_titanet_large>`_ speaker recognition, `MarbleNet <https://huggingface.co/nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0>`_ VAD
	- Audio: `Speech enhancement <https://huggingface.co/nvidia/sr_ssl_flowmatching_16k_430m>`_, source separation, neural audio codecs
	- SpeechLM2: `Canary-Qwen 2.5B <https://huggingface.co/nvidia/canary-qwen-2.5b>`_ (SALM), Duplex Speech-to-Speech -- HuggingFace Transformers backbone integration

	Inference & Deployment
	----------------------

	- Streaming and real-time ASR with cache-aware Conformer
	- GPU-accelerated decoding with `NGPU-LM <https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/asr_customization/ngpulm_language_modeling_and_customization.html>`_ language model fusion
	- Export to ONNX

	Voice Agent
	-----------

	- Open-source conversational agent framework built on `Pipecat <https://github.com/pipecat-ai/pipecat>`_
	- Streaming STT + LLM + TTS pipeline with natural turn-taking
	- Live speaker diarization and tool calling support

	----

	NeMo is built for researchers and engineers. Each collection provides prebuilt, modular components
	that can be customized, extended, and composed -- from rapid prototyping to multi-node training
	to production inference.

	`NVIDIA NeMo Toolkit <https://github.com/NVIDIA/NeMo>`_ has separate collections for:

	* :doc:`Automatic Speech Recognition (ASR) <asr/intro>`

	* :doc:`Text-to-Speech (TTS) <tts/intro>`

	* :doc:`Audio Processing <audio/intro>`

	* :doc:`SpeechLM2 <speechlm2/intro>`

	For quick guides and tutorials, see the "Getting started" section below.


	.. toctree::
	:maxdepth: 2
	:caption: Getting Started
	:name: starthere
	:titlesonly:

	starthere/intro
	starthere/fundamentals
	starthere/best-practices
	starthere/tutorials

	For more information, browse the developer docs for your area of interest in the contents section below or on the left sidebar.


	.. toctree::
	:maxdepth: 1
	:caption: Training
	:name: Training

	features/parallelisms
	features/mixed_precision

	.. toctree::
	:maxdepth: 1
	:caption: Model Checkpoints
	:name: Checkpoints

	checkpoints/intro

	.. toctree::
	:maxdepth: 1
	:caption: APIs
	:name: APIs
	:titlesonly:

	apis

	.. toctree::
	:maxdepth: 1
	:caption: Collections
	:name: Collections
	:titlesonly:

	collections

	.. toctree::
	:maxdepth: 1
	:caption: Speech AI Tools
	:name: Speech AI Tools
	:titlesonly:

	tools/intro