GGUF + pure-C++ runtime in CrispASR (HuBERT on the wav2vec2 backend)

#6
by cstr - opened

We've added HuBERT-Large to CrispASR's wav2vec2 backend. The same wav2vec2-ggml.cpp runtime transparently handles HuBERT (pre-norm transformer + single pos_conv) vs Data2Vec (post-norm + 5-layer pos_conv) vs XLSR (post-norm + single pos_conv) by reading GGUF metadata at load time — one binary, three architecture variants, no per-checkpoint patching.

212 MB at Q4_K, ~1.2 GB at F16. The CNN feature extractor was rewritten on ggml (im2col + mul_mat, F32 to avoid F16 precision collapse) — 10.8× faster than the naïve C++ baseline.

CTC, so pair with --punc-model for caps/punc:

Pre-quantised GGUFs (Apache-2.0): cstr/hubert-large-ls960-ft-GGUF

./build/bin/crispasr --backend wav2vec2 -m hubert-large-ls960-ft-q4_k.gguf \
    -f audio.wav --punc-model fullstop-punc-q4_k.gguf

Sister repos: Data2Vec, XLSR-53 EN, XLSR-53 DE.

Sign up or log in to comment