SpireLab's picture
Upload 21 files
b9102cf verified
|
Raw
History Blame Contribute Delete
2.72 kB
---
language:
- saz
- ta
license: cc-by-4.0
tags:
- text-to-speech
- tts
- vits
- sourashtra
- low-resource
pipeline_tag: text-to-speech
---
# Sourashtra VITS TTS Models
VITS text-to-speech models for the [Sourashtra language](https://en.wikipedia.org/wiki/Sourashtra_language) (ISO 639-3: `saz`), a minority Indo-Aryan language spoken primarily in Tamil Nadu, India. Trained using [Coqui TTS](https://github.com/coqui-ai/TTS) on a custom annotated speech corpus.
Four variants: **2 speakers** (male, female) × **2 input scripts** (Tamil script, Sourashtra script).
---
## Models
| Folder | Speaker | Input Script | Training Steps |
|--------|---------|--------------|----------------|
| `Sourashtra-Male_Script-tamil` | Male | Tamil (தமிழ்) | 300,000 |
| `Sourashtra-Male_Script-sourashtra` | Male | Sourashtra (ꢪꢾꢥꢶꢒ) | 300,000 |
| `Sourashtra-Female_Script-tamil` | Female | Tamil (தமிழ்) | 340,000 |
| `Sourashtra-Female_Script-sourashtra` | Female | Sourashtra (ꢪꢾꢥꢶꢒ) | 340,000 |
Each folder contains `best_model.pth`, `config.json`, `inference.py`, and `requirements.txt`.
---
## Setup
```bash
pip install -r requirements.txt
```
For GPU inference, install the CUDA-enabled PyTorch build matching your driver first — see [pytorch.org](https://pytorch.org/get-started/locally/).
---
## Usage
Run `inference.py` from inside the model folder:
```bash
# Male — Tamil script
cd Sourashtra-Male_Script-tamil
python inference.py "சொராஷ்ட்ர மொழி" -o output.wav
# Male — Sourashtra script
cd Sourashtra-Male_Script-sourashtra
python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav
# Female — Tamil script
cd Sourashtra-Female_Script-tamil
python inference.py "சொராஷ்ட்ர மொழி" -o output.wav
# Female — Sourashtra script
cd Sourashtra-Female_Script-sourashtra
python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav
```
Use `--gpu <id>` to select a GPU, or `--cpu` to force CPU inference.
---
## Script Notes
The Tamil-script and Sourashtra-script models produce speech from the same speaker — only the input orthography differs. Choose based on your text source.
- **Tamil script models** — strip `:`, `.`, `'` and apply NFC normalization automatically
- **Sourashtra script models** — strip Sourashtra Danda (꣎) and Double Danda (꣏) automatically
---
## Training
| Parameter | Value |
|-----------|-------|
| Architecture | VITS (end-to-end, flow-based) |
| Sample rate | 22050 Hz |
| Mel bins | 80 |
| Batch size | 16 |
| Mixed precision | Yes |
| Phonemes | No (character-level) |
Male training data: ~9,800–10,000 utterances. Female training data: ~11,400 utterances.