maniwebdev/xlsr_kurmanji_kurdish_custom
Kurmanji Kurdish Speech Recognition Model
This model was created by fine-tuning facebook/wav2vec2-xls-r-300m on the Mozilla Common Voice 8.0 Kurmanji Kurdish dataset.
It is designed to convert spoken Kurmanji Kurdish audio into text accurately and efficiently.
🧠 Model Description
This model is part of the Ferhengy project — a Kurdish language learning and transcription tool.
It builds upon multilingual speech representation learning using Wav2Vec2 XLS-R 300M and adapts it specifically for Kurmanji Kurdish.
🎯 Intended Uses
- Speech-to-text for Kurmanji Kurdish content (education, linguistics, or accessibility)
- Transcription of Kurdish audio for apps, media, or research
- Integration in applications that promote Kurdish digital language tools
🧾 Training Data
The model was trained on Common Voice Kurmanji Kurdish (v8.0), using:
train.tsvdev.tsvinvalidated.tsvreported.tsvother.tsv
Only samples with positive upvotes were used, and duplicates were removed to ensure high-quality data.
⚙️ Training Details
Training configuration (for reproducibility):
| Hyperparameter | Value |
|---|---|
| learning_rate | 9.6e-5 |
| train_batch_size | 16 |
| eval_batch_size | 16 |
| gradient_accumulation_steps | 16 |
| lr_scheduler_type | cosine_with_restarts |
| num_epochs | 100 |
| seed | 13 |
| mixed_precision_training | Native AMP |
Results
| Step | Training Loss | Validation Loss | WER |
|---|---|---|---|
| 1200 | 0.2263 | 0.2924 | 0.3886 |
🧩 Framework Versions
- Transformers: 4.16.0
- PyTorch: 1.10.0
- Datasets: 1.18.1
- Tokenizers: 0.10.3
🧪 Evaluation Example
To evaluate on Common Voice 8.0 (Kurdish):
python eval.py --model_id maniwebdev/xlsr_kurmanji_kurdish_custom --dataset mozilla-foundation/common_voice_8_0 --config kmr --split test
- Downloads last month
- 16
Evaluation results
- Test WER on Common Voice 8self-reported0.331
- Test CER on Common Voice 8self-reported0.080