GGUF + pure-C++ runtime in CrispASR (Data2Vec on the wav2vec2 backend)
#6
by cstr - opened
We've added Data2Vec-Audio to CrispASR's wav2vec2 backend. The same wav2vec2-ggml.cpp runtime auto-detects the architecture variants from GGUF metadata β Data2Vec's distinctive 5-layer pos_conv stack + post-norm transformer is one such variant; HuBERT is single pos_conv + pre-norm; XLSR is single pos_conv + post-norm.
Tiny β 79 MB at Q4_K, 314 MB at F16. Sub-realtime on CPU even at F16 (the CNN frontend is on ggml now β 10.8Γ faster than our old scalar-loop baseline).
CTC means no native punctuation β pair with --punc-model fireredpunc-q8_0.gguf (BERT-base, EN+CN) or fullstop-punc-q4_k.gguf (XLM-R-large, multilingual) for capitalised/punctuated output.
Pre-quantised GGUFs (Apache-2.0): cstr/data2vec-audio-960h-GGUF
./build/bin/crispasr --backend wav2vec2 -m data2vec-audio-960h-q4_k.gguf \
-f audio.wav --punc-model fireredpunc-q8_0.gguf