GGUF + pure-C++ runtime in CrispASR (Data2Vec on the wav2vec2 backend)

by cstr - opened May 1

May 1

We've added Data2Vec-Audio to CrispASR's wav2vec2 backend. The same wav2vec2-ggml.cpp runtime auto-detects the architecture variants from GGUF metadata — Data2Vec's distinctive 5-layer pos_conv stack + post-norm transformer is one such variant; HuBERT is single pos_conv + pre-norm; XLSR is single pos_conv + post-norm.

Tiny — 79 MB at Q4_K, 314 MB at F16. Sub-realtime on CPU even at F16 (the CNN frontend is on ggml now — 10.8× faster than our old scalar-loop baseline).

CTC means no native punctuation — pair with --punc-model fireredpunc-q8_0.gguf (BERT-base, EN+CN) or fullstop-punc-q4_k.gguf (XLM-R-large, multilingual) for capitalised/punctuated output.

Pre-quantised GGUFs (Apache-2.0): cstr/data2vec-audio-960h-GGUF

./build/bin/crispasr --backend wav2vec2 -m data2vec-audio-960h-q4_k.gguf \
    -f audio.wav --punc-model fireredpunc-q8_0.gguf

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment