Nekochu commited on
Commit
da4aff0
·
verified ·
1 Parent(s): 727a148

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -212
README.md CHANGED
@@ -1,212 +1,212 @@
1
- ---
2
- title: TTS Hub
3
- emoji: 🎙️
4
- colorFrom: purple
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 6.3.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- tags:
12
- - text-to-speech
13
- - voice-cloning
14
- - xtts
15
- - bark
16
- - mcp-server
17
- short_description: XTTS2 voice cloning + Bark TTS in one space
18
- ---
19
-
20
- # TTS Hub: XTTS2 + Bark
21
-
22
- Two powerful TTS models in one space, optimized for CPU.
23
-
24
- ## Models
25
-
26
- | Model | Voice Source | Languages | Special Features |
27
- |-------|--------------|-----------|------------------|
28
- | **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning |
29
- | **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control |
30
-
31
- ## Usage
32
-
33
- ### XTTS2 (Voice Cloning)
34
- 1. Upload 3-30 seconds of reference voice audio
35
- 2. Enter text to synthesize
36
- 3. Select language and speed
37
- 4. Click "Generate Speech"
38
-
39
- ### Bark (Preset Voices)
40
- 1. Select "Bark (Preset Voices)"
41
- 2. Choose a voice preset (e.g., `v2/en_speaker_6`)
42
- 3. Adjust temperature controls (optional):
43
- - **Text Temperature** (0.1-1.0): Controls semantic variation
44
- - **Waveform Temperature** (0.1-1.0): Controls audio variation
45
- 4. Set seed for reproducibility (optional, -1 for random)
46
- 5. Enter text with optional special tokens
47
- 6. Click "Generate Speech"
48
-
49
- **Bark special tokens:**
50
- - `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]`
51
- - `♪ la la la ♪` for singing
52
- - `MAN:` `WOMAN:` for speaker labels
53
-
54
- **Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments.
55
-
56
- ---
57
-
58
- ## API
59
-
60
- ### Python Client
61
-
62
- ```python
63
- from gradio_client import Client, handle_file
64
-
65
- client = Client("Luminia/xtts2-Bark")
66
-
67
- # XTTS2 (voice cloning)
68
- result = client.predict(
69
- text="Hello, this is a voice cloning test.",
70
- model_choice="XTTS2 (Voice Cloning)",
71
- reference_audio=handle_file("voice_sample.wav"),
72
- language="English",
73
- speed=1.0,
74
- voice_preset="v2/en_speaker_6",
75
- text_temp=0.7, # Bark only (ignored for XTTS2)
76
- waveform_temp=0.7, # Bark only (ignored for XTTS2)
77
- seed=-1, # Bark only (ignored for XTTS2)
78
- api_name="/synthesize"
79
- )
80
- print(result) # (audio_path, status)
81
-
82
- # Bark (preset voice) with temperature control
83
- result = client.predict(
84
- text="Hello! [laughter] This is Bark speaking.",
85
- model_choice="Bark (Preset Voices)",
86
- reference_audio=None,
87
- language="English",
88
- speed=1.0,
89
- voice_preset="v2/en_speaker_6",
90
- text_temp=0.7, # Semantic temperature (0.1-1.0)
91
- waveform_temp=0.7, # Audio waveform temperature (0.1-1.0)
92
- seed=42, # Set seed for reproducibility (-1 for random)
93
- api_name="/synthesize"
94
- )
95
- print(result)
96
- ```
97
-
98
- ### REST API (curl)
99
-
100
- ```bash
101
- # XTTS2 with voice cloning
102
- curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
103
- -H "Content-Type: application/json" \
104
- -d '{
105
- "data": [
106
- "Hello world",
107
- "XTTS2 (Voice Cloning)",
108
- {"path": "https://example.com/voice.wav"},
109
- "English",
110
- 1.0,
111
- "v2/en_speaker_6",
112
- 0.7,
113
- 0.7,
114
- -1
115
- ]
116
- }'
117
-
118
- # Bark with preset voice and temperature control
119
- curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
120
- -H "Content-Type: application/json" \
121
- -d '{
122
- "data": [
123
- "Hello [laughter] world",
124
- "Bark (Preset Voices)",
125
- null,
126
- "English",
127
- 1.0,
128
- "v2/en_speaker_3",
129
- 0.7,
130
- 0.7,
131
- 42
132
- ]
133
- }'
134
- ```
135
-
136
- ### MCP (Model Context Protocol)
137
-
138
- This Space supports MCP for AI assistants.
139
-
140
- **Tool schema:**
141
- ```json
142
- {
143
- "name": "synthesize",
144
- "parameters": {
145
- "text": {"type": "string", "description": "Text to synthesize"},
146
- "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]},
147
- "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"},
148
- "language": {"type": "string", "default": "English"},
149
- "speed": {"type": "number", "default": 1.0},
150
- "voice_preset": {"type": "string", "default": "v2/en_speaker_6"},
151
- "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"},
152
- "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"},
153
- "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"}
154
- },
155
- "returns": ["audio", "string"]
156
- }
157
- ```
158
-
159
- **MCP Config:**
160
- ```json
161
- {
162
- "mcpServers": {
163
- "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"}
164
- }
165
- }
166
- ```
167
-
168
- ---
169
-
170
- ## CLI Usage
171
-
172
- ```bash
173
- # XTTS2 voice cloning
174
- python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0
175
-
176
- # Bark preset voice (basic)
177
- python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6"
178
-
179
- # Bark with temperature control and seed
180
- python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \
181
- --text-temp 0.7 --waveform-temp 0.7 --seed 42
182
- ```
183
-
184
- ## Bark Voice Presets
185
-
186
- | Preset | Language |
187
- |--------|----------|
188
- | `v2/en_speaker_0` - `v2/en_speaker_9` | English |
189
- | `v2/de_speaker_0` - `v2/de_speaker_2` | German |
190
- | `v2/fr_speaker_0` - `v2/fr_speaker_1` | French |
191
- | `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish |
192
- | `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese |
193
- | `v2/ja_speaker_0` | Japanese |
194
- | `v2/ko_speaker_0` | Korean |
195
-
196
- ## Bark Temperature Guide
197
-
198
- | Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) |
199
- |---------|---------------|------------------|----------------|
200
- | **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable |
201
- | **Waveform Temp** | Cleaner audio | Natural variation | More expressive |
202
-
203
- **Recommended:** Start with 0.7 for both temperatures for natural-sounding speech.
204
-
205
- ---
206
-
207
- ## Credits
208
-
209
- - **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0)
210
- - **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT)
211
-
212
- Licensed under Apache 2.0.
 
1
+ ---
2
+ title: xtts2 + Bark TTS
3
+ emoji: 🎙️
4
+ colorFrom: purple
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 6.3.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ tags:
12
+ - text-to-speech
13
+ - voice-cloning
14
+ - xtts
15
+ - bark
16
+ - mcp-server
17
+ short_description: XTTS2 voice cloning + Bark TTS in one space
18
+ ---
19
+
20
+ # TTS Hub: XTTS2 + Bark
21
+
22
+ Two powerful TTS models in one space, optimized for CPU.
23
+
24
+ ## Models
25
+
26
+ | Model | Voice Source | Languages | Special Features |
27
+ |-------|--------------|-----------|------------------|
28
+ | **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning |
29
+ | **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control |
30
+
31
+ ## Usage
32
+
33
+ ### XTTS2 (Voice Cloning)
34
+ 1. Upload 3-30 seconds of reference voice audio
35
+ 2. Enter text to synthesize
36
+ 3. Select language and speed
37
+ 4. Click "Generate Speech"
38
+
39
+ ### Bark (Preset Voices)
40
+ 1. Select "Bark (Preset Voices)"
41
+ 2. Choose a voice preset (e.g., `v2/en_speaker_6`)
42
+ 3. Adjust temperature controls (optional):
43
+ - **Text Temperature** (0.1-1.0): Controls semantic variation
44
+ - **Waveform Temperature** (0.1-1.0): Controls audio variation
45
+ 4. Set seed for reproducibility (optional, -1 for random)
46
+ 5. Enter text with optional special tokens
47
+ 6. Click "Generate Speech"
48
+
49
+ **Bark special tokens:**
50
+ - `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]`
51
+ - `♪ la la la ♪` for singing
52
+ - `MAN:` `WOMAN:` for speaker labels
53
+
54
+ **Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments.
55
+
56
+ ---
57
+
58
+ ## API
59
+
60
+ ### Python Client
61
+
62
+ ```python
63
+ from gradio_client import Client, handle_file
64
+
65
+ client = Client("Luminia/xtts2-Bark")
66
+
67
+ # XTTS2 (voice cloning)
68
+ result = client.predict(
69
+ text="Hello, this is a voice cloning test.",
70
+ model_choice="XTTS2 (Voice Cloning)",
71
+ reference_audio=handle_file("voice_sample.wav"),
72
+ language="English",
73
+ speed=1.0,
74
+ voice_preset="v2/en_speaker_6",
75
+ text_temp=0.7, # Bark only (ignored for XTTS2)
76
+ waveform_temp=0.7, # Bark only (ignored for XTTS2)
77
+ seed=-1, # Bark only (ignored for XTTS2)
78
+ api_name="/synthesize"
79
+ )
80
+ print(result) # (audio_path, status)
81
+
82
+ # Bark (preset voice) with temperature control
83
+ result = client.predict(
84
+ text="Hello! [laughter] This is Bark speaking.",
85
+ model_choice="Bark (Preset Voices)",
86
+ reference_audio=None,
87
+ language="English",
88
+ speed=1.0,
89
+ voice_preset="v2/en_speaker_6",
90
+ text_temp=0.7, # Semantic temperature (0.1-1.0)
91
+ waveform_temp=0.7, # Audio waveform temperature (0.1-1.0)
92
+ seed=42, # Set seed for reproducibility (-1 for random)
93
+ api_name="/synthesize"
94
+ )
95
+ print(result)
96
+ ```
97
+
98
+ ### REST API (curl)
99
+
100
+ ```bash
101
+ # XTTS2 with voice cloning
102
+ curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
103
+ -H "Content-Type: application/json" \
104
+ -d '{
105
+ "data": [
106
+ "Hello world",
107
+ "XTTS2 (Voice Cloning)",
108
+ {"path": "https://example.com/voice.wav"},
109
+ "English",
110
+ 1.0,
111
+ "v2/en_speaker_6",
112
+ 0.7,
113
+ 0.7,
114
+ -1
115
+ ]
116
+ }'
117
+
118
+ # Bark with preset voice and temperature control
119
+ curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
120
+ -H "Content-Type: application/json" \
121
+ -d '{
122
+ "data": [
123
+ "Hello [laughter] world",
124
+ "Bark (Preset Voices)",
125
+ null,
126
+ "English",
127
+ 1.0,
128
+ "v2/en_speaker_3",
129
+ 0.7,
130
+ 0.7,
131
+ 42
132
+ ]
133
+ }'
134
+ ```
135
+
136
+ ### MCP (Model Context Protocol)
137
+
138
+ This Space supports MCP for AI assistants.
139
+
140
+ **Tool schema:**
141
+ ```json
142
+ {
143
+ "name": "synthesize",
144
+ "parameters": {
145
+ "text": {"type": "string", "description": "Text to synthesize"},
146
+ "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]},
147
+ "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"},
148
+ "language": {"type": "string", "default": "English"},
149
+ "speed": {"type": "number", "default": 1.0},
150
+ "voice_preset": {"type": "string", "default": "v2/en_speaker_6"},
151
+ "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"},
152
+ "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"},
153
+ "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"}
154
+ },
155
+ "returns": ["audio", "string"]
156
+ }
157
+ ```
158
+
159
+ **MCP Config:**
160
+ ```json
161
+ {
162
+ "mcpServers": {
163
+ "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"}
164
+ }
165
+ }
166
+ ```
167
+
168
+ ---
169
+
170
+ ## CLI Usage
171
+
172
+ ```bash
173
+ # XTTS2 voice cloning
174
+ python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0
175
+
176
+ # Bark preset voice (basic)
177
+ python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6"
178
+
179
+ # Bark with temperature control and seed
180
+ python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \
181
+ --text-temp 0.7 --waveform-temp 0.7 --seed 42
182
+ ```
183
+
184
+ ## Bark Voice Presets
185
+
186
+ | Preset | Language |
187
+ |--------|----------|
188
+ | `v2/en_speaker_0` - `v2/en_speaker_9` | English |
189
+ | `v2/de_speaker_0` - `v2/de_speaker_2` | German |
190
+ | `v2/fr_speaker_0` - `v2/fr_speaker_1` | French |
191
+ | `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish |
192
+ | `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese |
193
+ | `v2/ja_speaker_0` | Japanese |
194
+ | `v2/ko_speaker_0` | Korean |
195
+
196
+ ## Bark Temperature Guide
197
+
198
+ | Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) |
199
+ |---------|---------------|------------------|----------------|
200
+ | **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable |
201
+ | **Waveform Temp** | Cleaner audio | Natural variation | More expressive |
202
+
203
+ **Recommended:** Start with 0.7 for both temperatures for natural-sounding speech.
204
+
205
+ ---
206
+
207
+ ## Credits
208
+
209
+ - **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0)
210
+ - **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT)
211
+
212
+ Licensed under Apache 2.0.