glossKit-ASR/wav2vec2-large-xlsr-53-zza
Model Description
This model is a fine-tuned version of ShuanOsmanKarim/mohammadirad-wav2vec2-haw for automatic speech recognition, trained primarily on speech labeled with ISO 639-3 zza (ZZA).
Model Performance
- Word Error Rate (WER): 0.9373 (93.73%)
- Character Error Rate (CER): 0.4099 (40.99%)
- Test Samples: 67
Training Details
- Base Model: ShuanOsmanKarim/mohammadirad-wav2vec2-haw
- Language: ZZA (zza)
- Fine-tuning Framework: PyTorch / HuggingFace Transformers
Data provenance
Training segments were contributed through the following GlossKit projects: Zazakî Documentation Project. They are aggregated under the training language code zza.
Languages in the training set
Language labels come from ACTIVE contributing GlossKit projects; minutes are summed from metadata rows that include projectId (no estimates or splits):
- Zazakî (23min)
Contributing consultants
Consultant IDs (project aliases), age, and gender only — personal names are not listed:
- GM_f2 (48F)
- M004 (0F)
- S002 (60F)
- T003 (30M)
Contributing users
People who own or collaborate on the contributing GlossKit projects (alphabetical by display name):
- Mahîr Dogan
- Shuan Karim
Usage
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
processor = Wav2Vec2Processor.from_pretrained("glossKit-ASR/wav2vec2-large-xlsr-53-zza")
model = Wav2Vec2ForCTC.from_pretrained("glossKit-ASR/wav2vec2-large-xlsr-53-zza")
# Process audio and transcribe
inputs = processor(audio, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])
Citation
If you use this model in your research, please cite:
@misc{glosskit-asr-zza,
title={GlossKit ASR Model (ZZA)},
author={Mahîr Dogan and Shuan Osman Karim},
year={2026},
url={https://huggingface.co/glossKit-ASR/wav2vec2-large-xlsr-53-zza}
}
License
MIT License
- Downloads last month
- 54