🗣️ Tamil Text-to-Speech (TTS) using VITS

This repository provides a complete step-by-step guide to training your own speech synthesizer using the VITS (Variational Inference with Adversarial Learning for End-to-End Text-to-Speech) model with a custom Tamil dataset, powered by Coqui TTS.


📂 Dataset Structure

dataset/
├── wavs/                  # Audio files (.wav)
├── metadata.csv           # Training metadata
└── metadata_val.csv       # Validation metadata

🛠️ Setup Instructions

1. Clone Coqui TTS

git clone https://github.com/coqui-ai/TTS.git

2. Configure config.json

Update your config.json with the dataset path:

"datasets": [
  {
    "name": "speaker_data",
    "path": "datasets",
    "meta_file_train": "metadata.csv",
    "meta_file_val": "metadata_val.csv",
    "formatter": "ljspeech"
  }
]

3. Metadata Format

Each line in metadata.csv and metadata_val.csv must follow this format:

<file_name>|<speaker_name>|<text>
Example: 001|linda|Hi, How are you.

4. Train/Test Split

Automatically split your metadata into 90% training and 10% validation from metadata.csv:

python split_meta.py

5. Validate Audio Sample Rate

VITS requires audio sampled at 22050 Hz. Check your files using:

python validate_audio.py

6. Convert Audio to 22050 Hz

Use the provided script (Linux only):

bash convert_audio_sample_rate.sh

💡 For Windows or macOS versions, consult ChatGPT or adapt the script accordingly.


7. Find Unique Characters in Dataset

To extract all unique Tamil characters used in your dataset:

python TTS/TTS/bin/find_unique_chars.py --config_path config.json

8. Update Character Set in config.json

Add the characters to the characters section:

"characters": {
  "characters_class": "TTS.tts.models.vits.VitsCharacters",
  "pad": "_",
  "eos": "~",
  "bos": "^",
  "characters": "ஂஃஅஆஇஈஉஊஎஏஐஒஓஔகஙசஜஞடணதநனபமயரறலளழவஷஸஹாிீுூெேைொோௌ்",
  "punctuations": ".,!?; "
}

🚀 Training

Start training your model with:

python TTS/TTS/bin/train_tts.py --config_path config.json

🔊 Inference

Install the TTS CLI:

pip install TTS

Generate speech from text:

tts \
  --text "காதல் என்ற ஒன்று போதும். காலமெல்லாம் துணையாய் வாழ்வதற்கு" \
  --model_path ./outputs/tamil_vits-June-08-2025_11+16AM-0000000/best_model.pth \
  --config_path ./outputs/tamil_vits-June-08-2025_11+16AM-0000000/config.json \
  --out_path output.wav

💡 Tips

  • Python version: 3.10 recommended
  • Learning Rate Scheduler: ExponentialLR — gradually decreases the learning rate after each epoch or step.

🙏 Acknowledgements

  • Coqui TTS — Original VITS implementation
  • Tamil open-source community for promoting regional language AI
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train samprabin/tamil_vits

Space using samprabin/tamil_vits 1