| --- |
| license: cc-by-4.0 |
| language: |
| - as |
| - bn |
| - brx |
| - doi |
| - kn |
| - mai |
| - ml |
| - mr |
| - ne |
| - pa |
| - sa |
| - ta |
| - te |
| library_name: transformers |
| pipeline_tag: text-to-speech |
| tags: |
| - text-to-speech |
| --- |
| # VITS TTS for Indian Languages |
|
|
| This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more. |
|
|
| --- |
|
|
| ## Model Overview |
|
|
| The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features: |
| - **Languages**: Multiple Indian languages. |
| - **Styles**: Various speaking styles and emotions. |
| - **Speaker IDs**: Predefined speaker profiles for male and female voices. |
|
|
| --- |
|
|
| ## Installation |
|
|
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| Here's a quick example to get started: |
|
|
| ```python |
| import soundfile as sf |
| from transformers import AutoModel, AutoTokenizer |
| |
| model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda") |
| tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True) |
| |
| text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi |
| speaker_id = 16 # PAN_M |
| style_id = 0 # ALEXA |
| |
| inputs = tokenizer(text=text, return_tensors="pt").to("cuda") |
| outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id) |
| sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate) |
| print(outputs.waveform.shape) |
| ``` |
|
|
| --- |
|
|
| ## Supported Languages |
|
|
| - `Assamese` |
| - `Bengali` |
| - `Bodo` |
| - `Dogri` |
| - `Kannada` |
| - `Maithili` |
| - `Malayalam` |
| - `Marathi` |
| - `Nepali` |
| - `Punjabi` |
| - `Sanskrit` |
| - `Tamil` |
| - `Telugu` |
|
|
| --- |
|
|
| ## Speaker-Style Identifier Overview |
|
|
| <div style="display: flex; align-items: flex-start; gap: 20px; margin: 0; padding: 0;"> |
|
|
| <table style="margin: 0; padding: 0; border-spacing: 0;"> |
| <tr> |
| <th>Speaker Name</th> |
| <th>Speaker ID</th> |
| </tr> |
| <tr> |
| <td>ASM_F</td> |
| <td>0</td> |
| </tr> |
| <tr> |
| <td>ASM_M</td> |
| <td>1</td> |
| </tr> |
| <tr> |
| <td>BEN_F</td> |
| <td>2</td> |
| </tr> |
| <tr> |
| <td>BEN_M</td> |
| <td>3</td> |
| </tr> |
| <tr> |
| <td>BRX_F</td> |
| <td>4</td> |
| </tr> |
| <tr> |
| <td>BRX_M</td> |
| <td>5</td> |
| </tr> |
| <tr> |
| <td>DOI_F</td> |
| <td>6</td> |
| </tr> |
| <tr> |
| <td>DOI_M</td> |
| <td>7</td> |
| </tr> |
| <tr> |
| <td>KAN_F</td> |
| <td>8</td> |
| </tr> |
| <tr> |
| <td>KAN_M</td> |
| <td>9</td> |
| </tr> |
| <tr> |
| <td>MAI_M</td> |
| <td>10</td> |
| </tr> |
| <tr> |
| <td>MAL_F</td> |
| <td>11</td> |
| </tr> |
| <tr> |
| <td>MAR_F</td> |
| <td>12</td> |
| </tr> |
| <tr> |
| <td>MAR_M</td> |
| <td>13</td> |
| </tr> |
| <tr> |
| <td>NEP_F</td> |
| <td>14</td> |
| </tr> |
| <tr> |
| <td>PAN_F</td> |
| <td>15</td> |
| </tr> |
| <tr> |
| <td>PAN_M</td> |
| <td>16</td> |
| </tr> |
| <tr> |
| <td>SAN_M</td> |
| <td>17</td> |
| </tr> |
| <tr> |
| <td>TAM_F</td> |
| <td>18</td> |
| </tr> |
| <tr> |
| <td>TEL_F</td> |
| <td>19</td> |
| </tr> |
| </table> |
| |
| <table> |
| <tr> |
| <th>Style Name</th> |
| <th>Style ID</th> |
| </tr> |
| <tr> |
| <td>ALEXA</td> |
| <td>0</td> |
| </tr> |
| <tr> |
| <td>ANGER</td> |
| <td>1</td> |
| </tr> |
| <tr> |
| <td>BB</td> |
| <td>2</td> |
| </tr> |
| <tr> |
| <td>BOOK</td> |
| <td>3</td> |
| </tr> |
| <tr> |
| <td>CONV</td> |
| <td>4</td> |
| </tr> |
| <tr> |
| <td>DIGI</td> |
| <td>5</td> |
| </tr> |
| <tr> |
| <td>DISGUST</td> |
| <td>6</td> |
| </tr> |
| <tr> |
| <td>FEAR</td> |
| <td>7</td> |
| </tr> |
| <tr> |
| <td>HAPPY</td> |
| <td>8</td> |
| </tr> |
| <tr> |
| <td>NEWS</td> |
| <td>10</td> |
| </tr> |
| <tr> |
| <td>SAD</td> |
| <td>12</td> |
| </tr> |
| <tr> |
| <td>SURPRISE</td> |
| <td>14</td> |
| </tr> |
| <tr> |
| <td>UMANG</td> |
| <td>15</td> |
| </tr> |
| <tr> |
| <td>WIKI</td> |
| <td>16</td> |
| </tr> |
| </table> |
| |
| </div> |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @article{ai4bharat_vits_rasa_13, |
| title={VITS TTS for Indian Languages}, |
| author={Ashwin Sankar}, |
| year={2024}, |
| publisher={Hugging Face} |
| } |
| ``` |