Spaces:
Sleeping
Sleeping
| title: Vakya 2.0 - Text-to-Speech | |
| emoji: ποΈ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.2.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # ποΈ Vakya 2.0 - Text-to-Speech Playground | |
| **Vakya** is a high-quality Text-to-Speech model based on the IndicF5 architecture, supporting **11 Indian languages**. | |
| ## π Features | |
| - **Multi-language Support**: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu | |
| - **Voice Cloning**: Uses reference audio to clone voice characteristics | |
| - **High Quality**: 24kHz sample rate, 0.4B parameter model | |
| - **Easy to Use**: Simple interface for testing and experimentation | |
| ## π How to Use | |
| 1. **Load Model**: Click the "Load Model" button (first time may take a few minutes to download) | |
| 2. **Upload Reference Audio**: Upload a short audio clip (<15 seconds recommended) that represents the voice you want to clone | |
| 3. **Enter Reference Text** (Optional): Type what is spoken in the reference audio. If left blank, the model will auto-transcribe it | |
| 4. **Enter Text to Generate**: Type the text you want to synthesize in any supported language | |
| 5. **Adjust Settings** (Optional): | |
| - Speed: Control the speech rate (0.5x to 2.0x) | |
| - Remove Silences: Experimental feature to remove pauses | |
| 6. **Generate**: Click "Generate Speech" and wait for the audio output | |
| ## π Model Information | |
| - **Model**: Vakya 2.0 | |
| - **Repository**: [ashishkblink/vakya2.0](https://huggingface.co/ashishkblink/vakya2.0) | |
| - **Based on**: [IndicF5](https://github.com/AI4Bharat/IndicF5) by AI4Bharat (IIT Madras) | |
| - **Model Size**: 0.4B parameters | |
| - **Sample Rate**: 24000 Hz | |
| - **Training Data**: 1417 hours of high-quality speech | |
| - **License**: MIT License | |
| ## π‘ Tips for Best Results | |
| - Keep reference audio clips short (<15 seconds) for best results | |
| - Use clear, high-quality reference audio | |
| - Provide reference text when possible for better voice matching | |
| - The model works best with native speakers of the target language | |
| ## β οΈ Terms of Use | |
| - You must have explicit permission to clone voices | |
| - Unauthorized voice cloning is strictly prohibited | |
| - Any misuse of this model is the responsibility of the user | |
| - This model is for research and educational purposes | |
| ## π Links | |
| - **Model Repository**: [ashishkblink/vakya2.0](https://huggingface.co/ashishkblink/vakya2.0) | |
| - **GitHub**: [ashishkblink/vakya](https://github.com/ashishkblink/vakya) | |
| - **IndicF5**: [AI4Bharat/IndicF5](https://github.com/AI4Bharat/IndicF5) | |
| ## π Acknowledgments | |
| This model is based on **IndicF5** developed by AI4Bharat (IIT Madras). | |
| --- | |
| **Vakya** - Bringing voices to Indian languages ποΈ |