Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
English
hubert
speech
audio
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use facebook/hubert-large-ls960-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/hubert-large-ls960-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="facebook/hubert-large-ls960-ft")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("facebook/hubert-large-ls960-ft") model = AutoModelForCTC.from_pretrained("facebook/hubert-large-ls960-ft") - Notebooks
- Google Colab
- Kaggle
GGUF + pure-C++ runtime in CrispASR (HuBERT on the wav2vec2 backend)
#6
by cstr - opened
We've added HuBERT-Large to CrispASR's wav2vec2 backend. The same wav2vec2-ggml.cpp runtime transparently handles HuBERT (pre-norm transformer + single pos_conv) vs Data2Vec (post-norm + 5-layer pos_conv) vs XLSR (post-norm + single pos_conv) by reading GGUF metadata at load time — one binary, three architecture variants, no per-checkpoint patching.
212 MB at Q4_K, ~1.2 GB at F16. The CNN feature extractor was rewritten on ggml (im2col + mul_mat, F32 to avoid F16 precision collapse) — 10.8× faster than the naïve C++ baseline.
CTC, so pair with --punc-model for caps/punc:
Pre-quantised GGUFs (Apache-2.0): cstr/hubert-large-ls960-ft-GGUF
./build/bin/crispasr --backend wav2vec2 -m hubert-large-ls960-ft-q4_k.gguf \
-f audio.wav --punc-model fullstop-punc-q4_k.gguf
Sister repos: Data2Vec, XLSR-53 EN, XLSR-53 DE.