Automatic Speech Recognition
NeMo
PyTorch
English
speech
audio
CTC
FastConformer
Transformer
NeMo
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use nvidia/stt_en_fastconformer_ctc_large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/stt_en_fastconformer_ctc_large with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/stt_en_fastconformer_ctc_large") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
GGUF + pure-C++ runtime in CrispASR — FastConformer-CTC (NeMo)
#5
by cstr - opened
We've added the FastConformer-CTC family to CrispASR as the fastconformer-ctc backend. C++ binary, GGUF — no NeMo.
The encoder shares core/fastconformer.h with our parakeet and canary runtimes (NeMo conv subsampling + MHA with relative position) — same code path, different decode head. CTC greedy here, TDT for parakeet, transformer cross-attn for canary.
A few practical bits:
- All sizes ship as GGUF (large, xlarge, xxlarge).
- No autoregressive decode, so this is the lowest-latency English backend in CrispASR — great for streaming.
- No native punctuation (CTC), so the standard recipe is
--punc-model fireredpunc-q8_0.gguf(orfullstop-punc-q4_k.gguffor multilingual) to restore caps + punc as a post-step. - Word timestamps come from
-am canary-ctc-aligner.gguf(the same FastConformer-CTC encoder repurposed as a forced aligner — convenient since we already had it loaded).
Pre-quantised GGUFs (CC-BY-4.0): cstr/stt-en-fastconformer-ctc-large-GGUF
./build/bin/crispasr --backend fastconformer-ctc \
-m stt-en-fastconformer-ctc-large-q4_k.gguf \
-f audio.wav \
--punc-model fireredpunc-q8_0.gguf -osrt