ZipVoice Multilingual Finetuning

Hugging Face Space   Hugging Face Serbian Model   Hugging Face Arabic Model

Fine-tuning and inference setup for multilingual ZipVoice models, focused on:

  • Arabic (ar)
  • Serbian (sr)

Base model used in this project: ZipVoice.

Attribution

Hugging Face Model Links (Quick Access)

Demo page:

Arabic model page:

Serbian model page:

Training Datasets

Serbian

Arabic

Evaluation Metrics

Serbian (sr)

Metric Count Mean Median Std Min Max
wer 92 0.17 0.08 0.24 0.00 1.00
cer 92 0.10 0.02 0.25 0.00 2.00
wav_seconds 92 4.18 3.63 2.92 0.24 22.98
wavlm_sim 92 0.63 0.69 0.15 0.01 0.83

Arabic (ar)

Metric Count Mean Median Std Min Max
wer 100 0.14 0.00 0.20 0.00 1.00
cer 100 0.05 0.00 0.14 0.00 1.30
wav_seconds 100 3.39 2.66 2.20 0.62 12.48
wavlm_sim 100 0.45 0.48 0.14 0.01 0.69

Arabic WER caveat: Whisper does not output Arabic diacritics (ุนู„ุงู…ุงุช ุงู„ุชุดูƒูŠู„ / ุงู„ุญุฑูƒุงุช), so diacritic mismatches are not reflected in WER. โ—ŒูŽ ููŽุชู’ุญูŽุฉ | โ—Œู ุถูŽู…ูŽู‘ุฉ | โ—Œู ูƒูŽุณู’ุฑูŽุฉ | โ—Œู’ ุณููƒููˆู† | โ—Œู‘ ุดูŽุฏูŽู‘ุฉ | โ—Œู‹ ุชูŽู†ู’ูˆููŠู†ู ุงู„ููŽุชู’ุญ | โ—ŒูŒ ุชูŽู†ู’ูˆููŠู†ู ุงู„ุถูŽู‘ู… | โ—Œู ุชูŽู†ู’ูˆููŠู†ู ุงู„ูƒูŽุณู’ุฑ | โ—Œู“ ู…ูŽุฏูŽู‘ุฉ | โ—Œูฐ ุฃูŽู„ูู ุฎูŽู†ู’ุฌูŽุฑููŠูŽู‘ุฉ

Training Summary

Reported training time:

  • 2 days for each language
  • 1 extra day for each distilled model

Arabic Training Data

  • total_rows: 153666
  • total_duration: 395.46 hours

Serbian Training Data

  • total_rows: 92177
  • total_duration: 280.87 hours

Gradio Inference

Main app:

  • gradio_infer_fixed.py

Local run:

python3 -m pip install -r requirements.txt
python3 -m gradio_infer_fixed

System packages required:

  • ffmpeg
  • espeak-ng

Notes For Hugging Face Spaces

  • This README includes Spaces metadata in the front matter (sdk: gradio, app_file: gradio_infer_fixed.py).
  • The app is configured for Space deployment and can auto-download model artifacts/source when enabled by environment variables.
  • Arabic model page (again): karim1993/zipvoice-ar-finetuned
  • Serbian model page (again): karim1993/zipvoice-sr-finetuned
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using karim1993/zipvoice-sr-finetuned 1