ZipVoice Multilingual Finetuning
Fine-tuning and inference setup for multilingual ZipVoice models, focused on:
- Arabic (
ar) - Serbian (
sr)
Base model used in this project: ZipVoice.
Attribution
Hugging Face Model Links (Quick Access)
Demo page:
Arabic model page:
Serbian model page:
Training Datasets
Serbian
- CLARIN Dataset
11356/1834
Dataset link - CLARIN Dataset
11356/1679
Dataset link
Arabic
- Common Voice
Dataset link - ArVoice (Human split only)
Dataset link - MGB2 Arabic
Dataset link
Evaluation Metrics
Serbian (sr)
| Metric | Count | Mean | Median | Std | Min | Max |
|---|---|---|---|---|---|---|
| wer | 92 | 0.17 | 0.08 | 0.24 | 0.00 | 1.00 |
| cer | 92 | 0.10 | 0.02 | 0.25 | 0.00 | 2.00 |
| wav_seconds | 92 | 4.18 | 3.63 | 2.92 | 0.24 | 22.98 |
| wavlm_sim | 92 | 0.63 | 0.69 | 0.15 | 0.01 | 0.83 |
Arabic (ar)
| Metric | Count | Mean | Median | Std | Min | Max |
|---|---|---|---|---|---|---|
| wer | 100 | 0.14 | 0.00 | 0.20 | 0.00 | 1.00 |
| cer | 100 | 0.05 | 0.00 | 0.14 | 0.00 | 1.30 |
| wav_seconds | 100 | 3.39 | 2.66 | 2.20 | 0.62 | 12.48 |
| wavlm_sim | 100 | 0.45 | 0.48 | 0.14 | 0.01 | 0.69 |
Arabic WER caveat:
Whisper does not output Arabic diacritics (ุนูุงู
ุงุช ุงูุชุดููู / ุงูุญุฑูุงุช), so diacritic mismatches are not reflected in WER.
โู ููุชูุญูุฉ | โู ุถูู
ููุฉ | โู ููุณูุฑูุฉ | โู ุณููููู | โู ุดูุฏููุฉ | โู ุชูููููููู ุงูููุชูุญ | โู ุชูููููููู ุงูุถููู
| โู ุชูููููููู ุงูููุณูุฑ | โู ู
ูุฏููุฉ | โูฐ ุฃูููู ุฎูููุฌูุฑููููุฉ
Training Summary
Reported training time:
- 2 days for each language
- 1 extra day for each distilled model
Arabic Training Data
total_rows:153666total_duration:395.46 hours
Serbian Training Data
total_rows:92177total_duration:280.87 hours
Gradio Inference
Main app:
gradio_infer_fixed.py
Local run:
python3 -m pip install -r requirements.txt
python3 -m gradio_infer_fixed
System packages required:
ffmpegespeak-ng
Notes For Hugging Face Spaces
- This README includes Spaces metadata in the front matter (
sdk: gradio,app_file: gradio_infer_fixed.py). - The app is configured for Space deployment and can auto-download model artifacts/source when enabled by environment variables.
- Arabic model page (again): karim1993/zipvoice-ar-finetuned
- Serbian model page (again): karim1993/zipvoice-sr-finetuned
- Downloads last month
- 26