GGUF + pure-C++ runtime in CrispASR (HuBERT on the wav2vec2 backend)

by cstr - opened 14 days ago

We've added HuBERT-Large to CrispASR's wav2vec2 backend. The same wav2vec2-ggml.cpp runtime transparently handles HuBERT (pre-norm transformer + single pos_conv) vs Data2Vec (post-norm + 5-layer pos_conv) vs XLSR (post-norm + single pos_conv) by reading GGUF metadata at load time — one binary, three architecture variants, no per-checkpoint patching.

212 MB at Q4_K, ~1.2 GB at F16. The CNN feature extractor was rewritten on ggml (im2col + mul_mat, F32 to avoid F16 precision collapse) — 10.8× faster than the naïve C++ baseline.

CTC, so pair with --punc-model for caps/punc:

Pre-quantised GGUFs (Apache-2.0): cstr/hubert-large-ls960-ft-GGUF

./build/bin/crispasr --backend wav2vec2 -m hubert-large-ls960-ft-q4_k.gguf \
    -f audio.wav --punc-model fullstop-punc-q4_k.gguf

Sister repos: Data2Vec, XLSR-53 EN, XLSR-53 DE.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment