Nemotron ASR ONNX Models
ONNX-exported Nemotron-3.5 ASR streaming model (0.6B) for speech recognition.
Variants
| Variant | Provider | Description |
|---|---|---|
cpu/ |
CPUExecutionProvider | CPU-optimized (fp32 decoder/joint, int4/int8 encoder) |
gpu-cuda/ |
CUDAExecutionProvider | NVIDIA GPU via CUDA (fp32) |
gpu-dml/ |
DmlExecutionProvider | DirectML for Windows GPU (fp32) |
Model Architecture
- Encoder: Conformer-based audio encoder
- Decoder: Transformer-based text decoder
- Joint: Transducer joint network
- VAD: Silero VAD for voice activity detection
Usage with NemotronSpeech (C#)
git clone https://github.com/DimQ1/nemotron-speech-csharp
cd nemotron-speech-csharp
dotnet run -c Release -- models-onnx/gpu-cuda --mic --language auto
Conversion
Converted from NVIDIA NeMo .nemo checkpoint using Olive.
Original model: nemotron-3.5-asr-streaming-0.6b.nemo
License
See original NVIDIA Nemotron license.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support