W2V-BERT 2.0 ASR Adapters
This repository contains per-language bottleneck adapters for automatic speech recognition (ASR) trained on top of facebook/w2v-bert-2.0.
Model Description
- Base Model: facebook/w2v-bert-2.0 (600M parameters, frozen)
- Adapter Architecture: MMS-style bottleneck adapters (dim=64)
- Decoder: Lightweight transformer decoder (1 layer)
- Training: CTC loss with extended vocabulary for double vowels
Trained Adapters
Training in progress...
| Adapter | Language | WER | Train Samples |
|---|
Architecture
The model uses:
- Frozen w2v-bert-2.0 encoder - Extracts audio representations
- Bottleneck adapter - Language-specific adaptation (trainable)
- Lightweight decoder - Transformer decoder block (trainable)
- LM head - Per-language vocabulary projection (trainable)
Usage
Each adapter folder contains:
adapter_weights.pt- Bottleneck adapter weightsdecoder_weights.pt- Decoder block weightslm_head_weights.pt- Language model head weightsfinal_norm_weights.pt- Final layer norm weightsvocab.json- Language-specific vocabularyadapter_config.json- Adapter configurationmetrics.json- Training metrics
Loading an Adapter
import torch
from transformers import Wav2Vec2BertProcessor
# Load processor for specific language
processor = Wav2Vec2BertProcessor.from_pretrained(
"mutisya/w2v-bert-adapters-3lang-e10-25_52-v5",
subfolder="<adapter_id>"
)
# Load adapter weights
adapter_weights = torch.load(
hf_hub_download("mutisya/w2v-bert-adapters-3lang-e10-25_52-v5", "<adapter_id>/adapter_weights.pt")
)
Training Configuration
- Epochs: 10
- Learning Rate: 0.0005
- Batch Size: 24 ร 2 (effective: 48)
- Extended Vocabulary: True
- Adapter Dimension: 64
License
Apache 2.0
Citation
@misc{w2vbert-asr-adapters,
author = {Mutisya},
title = {W2V-BERT 2.0 ASR Adapters for African Languages},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/mutisya/w2v-bert-adapters-3lang-e10-25_52-v5}
}
Model tree for mutisya/w2v-bert-adapters-3lang-e10-25_52-v5
Base model
facebook/w2v-bert-2.0