--- language: - as - bn - brx - doi - kn - mai - ml - mr - ne - pa - sa - ta - te - hi library_name: transformers pipeline_tag: text-to-speech tags: - text-to-speech --- # VITS TTS for Indian Languages This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more. --- ## Model Overview The model `shethjenil/vits_rasa_13` is based on the VITS architecture and supports the following features: - **Languages**: Multiple Indian languages. - **Styles**: Various speaking styles and emotions. - **Speaker IDs**: Predefined speaker profiles for male and female voices. --- ## Installation ```bash pip install transformers torch ``` --- ## Usage Here's a quick example to get started: ```python import soundfile as sf import torch from transformers import AutoModel, AutoTokenizer from torch.nn.utils.rnn import pad_sequence model = AutoModel.from_pretrained("shethjenil/vits_rasa_13", trust_remote_code=True).to("cuda") tokenizer = AutoTokenizer.from_pretrained("shethjenil/vits_rasa_13", trust_remote_code=True) texts = ["एअर इंडिया ने घने कोहरे को लेकर यात्रियों के लिए अलर्ट जारी किया है। दिल्ली सहित उत्तर और पूर्वी भारत के कुछ हवाई अड्डों पर उड़ान संचालन प्रभावित हो सकता है। एयरलाइन ने यात्रा से पहले फ्लाइट स्टेटस जांचने की सलाह दी है।"] # Give only same sized text otherwise it is not worked or give 1 text at a time speaker_id = 16 # PAN_M style_id = 0 # ALEXA inputs = pad_sequence([torch.tensor([i if i else 0 for i in tokenizer.convert_tokens_to_ids(tokenizer.tokenize(t))]) for t in texts], batch_first=True).to("cuda") outputs = model(inputs, speaker_id=speaker_id, emotion_id=style_id) sf.write("audio.wav", outputs.waveform[0], model.config.sampling_rate) print(outputs.waveform.shape) ``` --- ## Supported Languages - `Assamese` - `Bengali` - `Bodo` - `Dogri` - `Kannada` - `Maithili` - `Malayalam` - `Marathi` - `Nepali` - `Punjabi` - `Sanskrit` - `Tamil` - `Telugu` --- ## Speaker-Style Identifier Overview

Speaker Name	Speaker ID
ASM_F	0
ASM_M	1
BEN_F	2
BEN_M	3
BRX_F	4
BRX_M	5
DOI_F	6
DOI_M	7
KAN_F	8
KAN_M	9
MAI_M	10
MAL_F	11
MAR_F	12
MAR_M	13
NEP_F	14
PAN_F	15
PAN_M	16
SAN_M	17
TAM_F	18
TEL_F	19

Style Name	Style ID
ALEXA	0
ANGER	1
BB	2
BOOK	3
CONV	4
DIGI	5
DISGUST	6
FEAR	7
HAPPY	8
NEWS	10
SAD	12
SURPRISE	14
UMANG	15
WIKI	16

---