File size: 2,673 Bytes
f150896
72ab360
 
 
 
f150896
11f0ab3
f150896
 
72ab360
f150896
 
72ab360
f150896
72ab360
f150896
72ab360
f150896
72ab360
 
 
 
f150896
 
 
72ab360
 
 
 
 
 
 
 
 
 
f150896
72ab360
 
 
 
 
 
 
f150896
72ab360
f150896
72ab360
 
 
 
f150896
72ab360
f150896
72ab360
 
 
 
f150896
72ab360
f150896
72ab360
 
 
f150896
72ab360
 
 
f150896
 
 
11f0ab3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: Vakya 2.0 - Text-to-Speech
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: mit
---

# πŸŽ™οΈ Vakya 2.0 - Text-to-Speech Playground

**Vakya** is a high-quality Text-to-Speech model based on the IndicF5 architecture, supporting **11 Indian languages**.

## 🌟 Features

- **Multi-language Support**: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu
- **Voice Cloning**: Uses reference audio to clone voice characteristics
- **High Quality**: 24kHz sample rate, 0.4B parameter model
- **Easy to Use**: Simple interface for testing and experimentation

## πŸš€ How to Use

1. **Load Model**: Click the "Load Model" button (first time may take a few minutes to download)
2. **Upload Reference Audio**: Upload a short audio clip (<15 seconds recommended) that represents the voice you want to clone
3. **Enter Reference Text** (Optional): Type what is spoken in the reference audio. If left blank, the model will auto-transcribe it
4. **Enter Text to Generate**: Type the text you want to synthesize in any supported language
5. **Adjust Settings** (Optional): 
   - Speed: Control the speech rate (0.5x to 2.0x)
   - Remove Silences: Experimental feature to remove pauses
6. **Generate**: Click "Generate Speech" and wait for the audio output

## πŸ“‹ Model Information

- **Model**: Vakya 2.0
- **Repository**: [ashishkblink/vakya2.0](https://huggingface.co/ashishkblink/vakya2.0)
- **Based on**: [IndicF5](https://github.com/AI4Bharat/IndicF5) by AI4Bharat (IIT Madras)
- **Model Size**: 0.4B parameters
- **Sample Rate**: 24000 Hz
- **Training Data**: 1417 hours of high-quality speech
- **License**: MIT License

## πŸ’‘ Tips for Best Results

- Keep reference audio clips short (<15 seconds) for best results
- Use clear, high-quality reference audio
- Provide reference text when possible for better voice matching
- The model works best with native speakers of the target language

## ⚠️ Terms of Use

- You must have explicit permission to clone voices
- Unauthorized voice cloning is strictly prohibited
- Any misuse of this model is the responsibility of the user
- This model is for research and educational purposes

## πŸ”— Links

- **Model Repository**: [ashishkblink/vakya2.0](https://huggingface.co/ashishkblink/vakya2.0)
- **GitHub**: [ashishkblink/vakya](https://github.com/ashishkblink/vakya)
- **IndicF5**: [AI4Bharat/IndicF5](https://github.com/AI4Bharat/IndicF5)

## πŸ™ Acknowledgments

This model is based on **IndicF5** developed by AI4Bharat (IIT Madras).

---

**Vakya** - Bringing voices to Indian languages πŸŽ™οΈ