MahaTTSv2 / README.md
rasenganai's picture
Update README.md
7b87527 verified

A newer version of the Gradio SDK is available: 6.8.0

Upgrade
metadata
title: MahaTTSv2
emoji: 
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: apache-2.0

Screenshot-2024-01-15-at-8-14-08-PM

MahaTTS v2: An Open-Source Large Speech Generation Model

a Dubverse Black initiative


Description

We introduce MahaTTS v2, a multi-speaker text-to-speech (TTS) system that has been trained on 50k hours of Indic and global languages. We have followed a text-to-semantic-to-acoustic approach, leveraging wav2vec2 tokens, this gives out-the-box generalization to unseen low-resourced languages. We have open sourced the first version (MahaTTS), which was trained on English and Indic languages as two separate models on 9k and 400 hours of open source datasets. In MahaTTS v2, we have collected over 20k+ hours of training data into a single multilingual cross-lingual model. We have used gemma as the backbone for text-to-semantic modeling and a conditional flow model for semantics to mel spectogram generation, using a BigVGAN vocoder to generate the final audio waveform. The model has shown great robustness and quality results compared to the previous version. We are also open sourcing the ability to finetune on your own voice.

With this release:

  • generate voices in multiple seen and unseen speaker identities (voice cloning)
  • generate voices in multiple langauges (multilingual and cross-lingual voice cloning)
  • copy the style of speech from one speaker to another (cross-lingual voice cloning with prosody and intonation transfer)
  • Train your own large scale pretraining or finetuning Models.

MahaTTS Architecture

Screenshot 2025-07-10 at 4 04 08 PM

Model Params

Model Parameters Model Type Output
Text to Semantic (M1) 510 M Causal LM 10,001 Tokens
Semantic to MelSpec(M2) 71 M FLOW 100x Melspec
BigVGAN Vocoder 112 M GAN Audio Waveform

🌐 Supported Languages

The following languages are currently supported:

Language Status
Assamese (in)
Bengali (in)
Bhojpuri (in)
Bodo (in)
Dogri (in)
Odia (in)
English (en)
French (fr)
Gujarati (in)
German (de)
Hindi (in)
Italian (it)
Kannada (in)
Malayalam (in)
Marathi (in)
Telugu (in)
Punjabi (in)
Rajasthani (in)
Sanskrit (in)
Spanish (es)
Tamil (in)
Telugu (in)

TODO:

  1. Addind Training Instructions.
  2. Add a colab for the same.

License

MahaTTS is licensed under the Apache 2.0 License.

🙏 Appreciation