Supported in CrispASR (C++ runtime, GGUF format)

by cstr - opened Apr 27

Discussion

cstr

Apr 27

•

edited Apr 27

Hi! We've added native support for this model in CrispASR, our unified C++ ASR toolkit.

What's included:

ONNX → GGUF converter with graph topology tracing for anonymous tensor mapping
Full C++ runtime: whisper-base encoder (6L) + 2-layer TransformerDecoder + frame classifier
Works with both F32 (113 MB) and Q4_K (22 MB) quantization
Auto-download: --vad -vm whisper-vad fetches the Q4_K GGUF automatically
Integrated into the VAD dispatch pipeline alongside Silero VAD and FireRedVAD

Pre-built GGUFs: cstr/whisper-vad-encdec-asmr-GGUF

Usage:

crispasr --backend whisper -m auto --auto-download --vad -vm whisper-vad -f audio.wav

Tested on English, German, and various audio lengths. The model produces clean VAD segmentation (similar to FireRedVAD) but adds ~1s overhead from the whisper encoder pass. See the benchmark comparison in our README.

Thanks for publishing the model!

cstr

Apr 27

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment