Supported in CrispASR (C++ runtime, GGUF format)

#3
by cstr - opened

Hi! We've added native support for this model in CrispASR, our unified C++ ASR toolkit.

What's included:

  • ONNX → GGUF converter with graph topology tracing for anonymous tensor mapping
  • Full C++ runtime: whisper-base encoder (6L) + 2-layer TransformerDecoder + frame classifier
  • Works with both F32 (113 MB) and Q4_K (22 MB) quantization
  • Auto-download: --vad -vm whisper-vad fetches the Q4_K GGUF automatically
  • Integrated into the VAD dispatch pipeline alongside Silero VAD and FireRedVAD

Pre-built GGUFs: cstr/whisper-vad-encdec-asmr-GGUF

Usage:

crispasr --backend whisper -m auto --auto-download --vad -vm whisper-vad -f audio.wav

Tested on English, German, and various audio lengths. The model produces clean VAD segmentation (similar to FireRedVAD) but adds ~1s overhead from the whisper encoder pass. See the benchmark comparison in our README.

Thanks for publishing the model!

This comment has been hidden

Sign up or log in to comment