Nemotron ASR ONNX Models

ONNX-exported Nemotron-3.5 ASR streaming model (0.6B) for speech recognition.

Variants

Variant	Provider	Description
`cpu/`	CPUExecutionProvider	CPU-optimized (fp32 decoder/joint, int4/int8 encoder)
`gpu-cuda/`	CUDAExecutionProvider	NVIDIA GPU via CUDA (fp32)
`gpu-dml/`	DmlExecutionProvider	DirectML for Windows GPU (fp32)

Model Architecture

Encoder: Conformer-based audio encoder
Decoder: Transformer-based text decoder
Joint: Transducer joint network
VAD: Silero VAD for voice activity detection

Usage with NemotronSpeech (C#)

git clone https://github.com/DimQ1/nemotron-speech-csharp
cd nemotron-speech-csharp
dotnet run -c Release -- models-onnx/gpu-cuda --mic --language auto

Conversion

Converted from NVIDIA NeMo .nemo checkpoint using Olive. Original model: nemotron-3.5-asr-streaming-0.6b.nemo

License

See original NVIDIA Nemotron license.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support