--- language: - saz - ta license: cc-by-4.0 tags: - text-to-speech - tts - vits - sourashtra - low-resource pipeline_tag: text-to-speech --- # Sourashtra VITS TTS Models VITS text-to-speech models for the [Sourashtra language](https://en.wikipedia.org/wiki/Sourashtra_language) (ISO 639-3: `saz`), a minority Indo-Aryan language spoken primarily in Tamil Nadu, India. Trained using [Coqui TTS](https://github.com/coqui-ai/TTS) on a custom annotated speech corpus. Four variants: **2 speakers** (male, female) × **2 input scripts** (Tamil script, Sourashtra script). --- ## Models | Folder | Speaker | Input Script | Training Steps | |--------|---------|--------------|----------------| | `Sourashtra-Male_Script-tamil` | Male | Tamil (தமிழ்) | 300,000 | | `Sourashtra-Male_Script-sourashtra` | Male | Sourashtra (ꢪꢾꢥꢶꢒ) | 300,000 | | `Sourashtra-Female_Script-tamil` | Female | Tamil (தமிழ்) | 340,000 | | `Sourashtra-Female_Script-sourashtra` | Female | Sourashtra (ꢪꢾꢥꢶꢒ) | 340,000 | Each folder contains `best_model.pth`, `config.json`, `inference.py`, and `requirements.txt`. --- ## Setup ```bash pip install -r requirements.txt ``` For GPU inference, install the CUDA-enabled PyTorch build matching your driver first — see [pytorch.org](https://pytorch.org/get-started/locally/). --- ## Usage Run `inference.py` from inside the model folder: ```bash # Male — Tamil script cd Sourashtra-Male_Script-tamil python inference.py "சொராஷ்ட்ர மொழி" -o output.wav # Male — Sourashtra script cd Sourashtra-Male_Script-sourashtra python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav # Female — Tamil script cd Sourashtra-Female_Script-tamil python inference.py "சொராஷ்ட்ர மொழி" -o output.wav # Female — Sourashtra script cd Sourashtra-Female_Script-sourashtra python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav ``` Use `--gpu ` to select a GPU, or `--cpu` to force CPU inference. --- ## Script Notes The Tamil-script and Sourashtra-script models produce speech from the same speaker — only the input orthography differs. Choose based on your text source. - **Tamil script models** — strip `:`, `.`, `'` and apply NFC normalization automatically - **Sourashtra script models** — strip Sourashtra Danda (꣎) and Double Danda (꣏) automatically --- ## Training | Parameter | Value | |-----------|-------| | Architecture | VITS (end-to-end, flow-based) | | Sample rate | 22050 Hz | | Mel bins | 80 | | Batch size | 16 | | Mixed precision | Yes | | Phonemes | No (character-level) | Male training data: ~9,800–10,000 utterances. Female training data: ~11,400 utterances.