Configuration Parsing Warning:Invalid JSON for config file config.json
XTTS v2 โ Gujarati & Hindi fine-tune
A fine-tuned XTTS v2 model for Gujarati and Hindi text-to-speech with voice cloning.
Model details
| Attribute | Value |
|---|---|
| Base model | coqui/XTTS-v2 |
| Languages | Gujarati (gu), Hindi (hi) |
| Training | 5 epochs on NVIDIA L4 (24 GB) |
| Effective batch size | 32 (batch_size=4, grad_acumm=8) |
| Learning rate | 5e-6 |
| Vocab extension | +404 Gujarati tokens |
Training data
Fine-tuned on Arjun4707/gu-hi-tts: ~40K Gujarati + ~11K Hindi clips.
Data source: Audio clips scraped from publicly available YouTube videos, transcribed, cleaned (silence-trimmed, peak-normalized to -3 dBFS), stored as 24kHz mono PCM-16 WAV.
Files
| File | Description |
|---|---|
model.pth |
Fine-tuned GPT encoder weights |
dvae.pth |
Discrete VAE (from base XTTS v2) |
vocab.json |
Extended vocabulary (base + 404 Gujarati tokens) |
config.json |
Model configuration |
mel_stats.pth |
Mel spectrogram statistics |
Known limitations
- Short sentences (< 5 words) may produce blabbering artifacts due to noisy scraped data
- Longer sentences (10+ words) produce noticeably better quality
Training code
Full training pipeline, patches, and troubleshooting: BhammarArjun/TTS_1_training
License
CC-BY-NC-4.0 โ Non-commercial use only.
The base XTTS v2 is licensed under Coqui Public Model License (non-commercial). The training data was sourced from YouTube audio. Both factors require a non-commercial license.
Citation
@misc{arjun2026xttsguhi,
title = {XTTS v2 fine-tuned for Gujarati and Hindi},
author = {Arjun Bhammar},
year = {2026},
url = {https://huggingface.co/Arjun4707/xtts-v2-gujarati-hindi}
}
Acknowledgements
- Coqui AI / TTS for the original XTTS v2
- anhnh2002 for the fine-tuning framework
- Downloads last month
- 3
Model tree for Arjun4707/xtts-v2-gujarati-hindi
Base model
coqui/XTTS-v2