| --- |
| language: |
| - saz |
| - ta |
| license: cc-by-4.0 |
| tags: |
| - text-to-speech |
| - tts |
| - vits |
| - sourashtra |
| - low-resource |
| pipeline_tag: text-to-speech |
| --- |
| |
| # Sourashtra VITS TTS Models |
|
|
| VITS text-to-speech models for the [Sourashtra language](https://en.wikipedia.org/wiki/Sourashtra_language) (ISO 639-3: `saz`), a minority Indo-Aryan language spoken primarily in Tamil Nadu, India. Trained using [Coqui TTS](https://github.com/coqui-ai/TTS) on a custom annotated speech corpus. |
|
|
| Four variants: **2 speakers** (male, female) × **2 input scripts** (Tamil script, Sourashtra script). |
|
|
| --- |
|
|
| ## Models |
|
|
| | Folder | Speaker | Input Script | Training Steps | |
| |--------|---------|--------------|----------------| |
| | `Sourashtra-Male_Script-tamil` | Male | Tamil (தமிழ்) | 300,000 | |
| | `Sourashtra-Male_Script-sourashtra` | Male | Sourashtra (ꢪꢾꢥꢶꢒ) | 300,000 | |
| | `Sourashtra-Female_Script-tamil` | Female | Tamil (தமிழ்) | 340,000 | |
| | `Sourashtra-Female_Script-sourashtra` | Female | Sourashtra (ꢪꢾꢥꢶꢒ) | 340,000 | |
|
|
| Each folder contains `best_model.pth`, `config.json`, `inference.py`, and `requirements.txt`. |
|
|
| --- |
|
|
| ## Setup |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| For GPU inference, install the CUDA-enabled PyTorch build matching your driver first — see [pytorch.org](https://pytorch.org/get-started/locally/). |
|
|
| --- |
|
|
| ## Usage |
|
|
| Run `inference.py` from inside the model folder: |
|
|
| ```bash |
| # Male — Tamil script |
| cd Sourashtra-Male_Script-tamil |
| python inference.py "சொராஷ்ட்ர மொழி" -o output.wav |
| |
| # Male — Sourashtra script |
| cd Sourashtra-Male_Script-sourashtra |
| python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav |
| |
| # Female — Tamil script |
| cd Sourashtra-Female_Script-tamil |
| python inference.py "சொராஷ்ட்ர மொழி" -o output.wav |
| |
| # Female — Sourashtra script |
| cd Sourashtra-Female_Script-sourashtra |
| python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav |
| ``` |
|
|
| Use `--gpu <id>` to select a GPU, or `--cpu` to force CPU inference. |
|
|
| --- |
|
|
| ## Script Notes |
|
|
| The Tamil-script and Sourashtra-script models produce speech from the same speaker — only the input orthography differs. Choose based on your text source. |
|
|
| - **Tamil script models** — strip `:`, `.`, `'` and apply NFC normalization automatically |
| - **Sourashtra script models** — strip Sourashtra Danda (꣎) and Double Danda (꣏) automatically |
|
|
| --- |
|
|
| ## Training |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Architecture | VITS (end-to-end, flow-based) | |
| | Sample rate | 22050 Hz | |
| | Mel bins | 80 | |
| | Batch size | 16 | |
| | Mixed precision | Yes | |
| | Phonemes | No (character-level) | |
|
|
| Male training data: ~9,800–10,000 utterances. Female training data: ~11,400 utterances. |
|
|