---
language:
- saz
- ta
license: cc-by-4.0
tags:
- text-to-speech
- tts
- vits
- sourashtra
- low-resource
pipeline_tag: text-to-speech
---

# Sourashtra VITS TTS Models

VITS text-to-speech models for the [Sourashtra language](https://en.wikipedia.org/wiki/Sourashtra_language) (ISO 639-3: `saz`), a minority Indo-Aryan language spoken primarily in Tamil Nadu, India. Trained using [Coqui TTS](https://github.com/coqui-ai/TTS) on a custom annotated speech corpus.

Four variants: **2 speakers** (male, female) × **2 input scripts** (Tamil script, Sourashtra script).

---

## Models

| Folder | Speaker | Input Script | Training Steps |
|--------|---------|--------------|----------------|
| `Sourashtra-Male_Script-tamil` | Male | Tamil (தமிழ்) | 300,000 |
| `Sourashtra-Male_Script-sourashtra` | Male | Sourashtra (ꢪꢾꢥꢶꢒ) | 300,000 |
| `Sourashtra-Female_Script-tamil` | Female | Tamil (தமிழ்) | 340,000 |
| `Sourashtra-Female_Script-sourashtra` | Female | Sourashtra (ꢪꢾꢥꢶꢒ) | 340,000 |

Each folder contains `best_model.pth`, `config.json`, `inference.py`, and `requirements.txt`.

---

## Setup

```bash
pip install -r requirements.txt
```

For GPU inference, install the CUDA-enabled PyTorch build matching your driver first — see [pytorch.org](https://pytorch.org/get-started/locally/).

---

## Usage

Run `inference.py` from inside the model folder:

```bash
# Male — Tamil script
cd Sourashtra-Male_Script-tamil
python inference.py "சொராஷ்ட்ர மொழி" -o output.wav

# Male — Sourashtra script
cd Sourashtra-Male_Script-sourashtra
python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav

# Female — Tamil script
cd Sourashtra-Female_Script-tamil
python inference.py "சொராஷ்ட்ர மொழி" -o output.wav

# Female — Sourashtra script
cd Sourashtra-Female_Script-sourashtra
python inference.py "ꢪꢾꢥꢶꢒ ꢪꢒꢡ" -o output.wav
```

Use `--gpu <id>` to select a GPU, or `--cpu` to force CPU inference.

---

## Script Notes

The Tamil-script and Sourashtra-script models produce speech from the same speaker — only the input orthography differs. Choose based on your text source.

- **Tamil script models** — strip `:`, `.`, `'` and apply NFC normalization automatically
- **Sourashtra script models** — strip Sourashtra Danda (꣎) and Double Danda (꣏) automatically

---

## Training

| Parameter | Value |
|-----------|-------|
| Architecture | VITS (end-to-end, flow-based) |
| Sample rate | 22050 Hz |
| Mel bins | 80 |
| Batch size | 16 |
| Mixed precision | Yes |
| Phonemes | No (character-level) |

Male training data: ~9,800–10,000 utterances. Female training data: ~11,400 utterances.