GGUF + pure-C++ runtime in CrispASR — FastConformer-CTC (NeMo)

by cstr - opened May 1

May 1

We've added the FastConformer-CTC family to CrispASR as the fastconformer-ctc backend. C++ binary, GGUF — no NeMo.

The encoder shares core/fastconformer.h with our parakeet and canary runtimes (NeMo conv subsampling + MHA with relative position) — same code path, different decode head. CTC greedy here, TDT for parakeet, transformer cross-attn for canary.

A few practical bits:

All sizes ship as GGUF (large, xlarge, xxlarge).
No autoregressive decode, so this is the lowest-latency English backend in CrispASR — great for streaming.
No native punctuation (CTC), so the standard recipe is --punc-model fireredpunc-q8_0.gguf (or fullstop-punc-q4_k.gguf for multilingual) to restore caps + punc as a post-step.
Word timestamps come from -am canary-ctc-aligner.gguf (the same FastConformer-CTC encoder repurposed as a forced aligner — convenient since we already had it loaded).

Pre-quantised GGUFs (CC-BY-4.0): cstr/stt-en-fastconformer-ctc-large-GGUF

./build/bin/crispasr --backend fastconformer-ctc \
    -m stt-en-fastconformer-ctc-large-q4_k.gguf \
    -f audio.wav \
    --punc-model fireredpunc-q8_0.gguf -osrt

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment