cstr's picture
Upload README.md with huggingface_hub
05c11bc verified
metadata
license: apache-2.0
language:
  - en
tags:
  - gguf
  - audio
  - speech-recognition
  - data2vec
  - wav2vec2
  - ctc
  - automatic-speech-recognition
base_model: facebook/data2vec-audio-base-960h
pipeline_tag: automatic-speech-recognition

Data2Vec Audio (GGUF)

GGUF conversion of facebook/data2vec-audio-base-960h for use with CrispASR.

Model Details

  • Architecture: Data2Vec Audio — wav2vec2-style CNN (7L, 512-dim) + 12-layer transformer (768-dim, 12 heads) + CTC head
  • Parameters: ~95M
  • Training: Self-supervised pre-training on LibriSpeech 960h, fine-tuned with CTC loss
  • Language: English only
  • License: Apache 2.0
  • WER: 1.89% (LibriSpeech test-clean), 4.07% (test-other)

Usage with CrispASR

# Uses the wav2vec2 backend (auto-detected from GGUF architecture)
crispasr --backend wav2vec2 -m data2vec-audio-base-960h-q4_k.gguf -f audio.wav

Architecture Notes

Data2Vec Audio differs from standard wav2vec2 in three ways handled by the converter:

  1. 5-layer positional convolution (vs 1 for wav2vec2), each with Conv1d + LayerNorm(no affine) + GELU
  2. Global encoder LayerNorm BEFORE transformer layers (vs after for wav2vec2)
  3. POST-norm encoder despite using LayerNorm in CNN (wav2vec2-large uses pre-norm)

All three are auto-detected from the HuggingFace model config and stored as GGUF metadata flags.

Files

File Size JFK Transcription
data2vec-audio-base-960h-f16.gguf 196 MB perfect
data2vec-audio-base-960h-q4_k.gguf 79 MB perfect
data2vec-audio-base-960h-q8_0.gguf 120 MB perfect

Accuracy

Tested on JFK inaugural address (11s):

AND SO A MY FELLOW AMERICANS ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU
ASK WHAT YOU CAN DO FOR YOUR COUNTRY

Identical to the Python HuggingFace reference output. All quantized variants produce the same transcription.

Citation

@inproceedings{baevski2022data2vec,
  title={data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language},
  author={Baevski, Alexei and Hsu, Wei-Ning and Xu, Qiantong and Babu, Arun and Gu, Jiatao and Auli, Michael},
  booktitle={ICML},
  year={2022}
}