Supported in CrispASR (C++ runtime, GGUF format)
#3
by cstr - opened
Hi! We've added native support for this model in CrispASR, our unified C++ ASR toolkit.
What's included:
- ONNX → GGUF converter with graph topology tracing for anonymous tensor mapping
- Full C++ runtime: whisper-base encoder (6L) + 2-layer TransformerDecoder + frame classifier
- Works with both F32 (113 MB) and Q4_K (22 MB) quantization
- Auto-download:
--vad -vm whisper-vadfetches the Q4_K GGUF automatically - Integrated into the VAD dispatch pipeline alongside Silero VAD and FireRedVAD
Pre-built GGUFs: cstr/whisper-vad-encdec-asmr-GGUF
Usage:
crispasr --backend whisper -m auto --auto-download --vad -vm whisper-vad -f audio.wav
Tested on English, German, and various audio lengths. The model produces clean VAD segmentation (similar to FireRedVAD) but adds ~1s overhead from the whisper encoder pass. See the benchmark comparison in our README.
Thanks for publishing the model!
This comment has been hidden