Spaces:

Luminia
/

xtts2-Bark

Sleeping

App Files Files Community

Nekochu commited on 30 days ago

Commit

da4aff0

verified ·

1 Parent(s): 727a148

Update README.md

Browse files

Files changed (1) hide show

README.md +212 -212

README.md CHANGED Viewed

@@ -1,212 +1,212 @@
----
-title: TTS Hub
-emoji: 🎙️
-colorFrom: purple
-colorTo: pink
-sdk: gradio
-sdk_version: 6.3.0
-app_file: app.py
-pinned: false
-license: apache-2.0
-tags:
-  - text-to-speech
-  - voice-cloning
-  - xtts
-  - bark
-  - mcp-server
-short_description: XTTS2 voice cloning + Bark TTS in one space
----
-# TTS Hub: XTTS2 + Bark
-Two powerful TTS models in one space, optimized for CPU.
-## Models
-| Model | Voice Source | Languages | Special Features |
-|-------|--------------|-----------|------------------|
-| **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning |
-| **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control |
-## Usage
-### XTTS2 (Voice Cloning)
-1. Upload 3-30 seconds of reference voice audio
-2. Enter text to synthesize
-3. Select language and speed
-4. Click "Generate Speech"
-### Bark (Preset Voices)
-1. Select "Bark (Preset Voices)"
-2. Choose a voice preset (e.g., `v2/en_speaker_6`)
-3. Adjust temperature controls (optional):
-   - **Text Temperature** (0.1-1.0): Controls semantic variation
-   - **Waveform Temperature** (0.1-1.0): Controls audio variation
-4. Set seed for reproducibility (optional, -1 for random)
-5. Enter text with optional special tokens
-6. Click "Generate Speech"
-**Bark special tokens:**
-- `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]`
-- `♪ la la la ♪` for singing
-- `MAN:` `WOMAN:` for speaker labels
-**Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments.
----
-## API
-### Python Client
-```python
-from gradio_client import Client, handle_file
-client = Client("Luminia/xtts2-Bark")
-# XTTS2 (voice cloning)
-result = client.predict(
-    text="Hello, this is a voice cloning test.",
-    model_choice="XTTS2 (Voice Cloning)",
-    reference_audio=handle_file("voice_sample.wav"),
-    language="English",
-    speed=1.0,
-    voice_preset="v2/en_speaker_6",
-    text_temp=0.7,       # Bark only (ignored for XTTS2)
-    waveform_temp=0.7,   # Bark only (ignored for XTTS2)
-    seed=-1,             # Bark only (ignored for XTTS2)
-    api_name="/synthesize"
-)
-print(result)  # (audio_path, status)
-# Bark (preset voice) with temperature control
-result = client.predict(
-    text="Hello! [laughter] This is Bark speaking.",
-    model_choice="Bark (Preset Voices)",
-    reference_audio=None,
-    language="English",
-    speed=1.0,
-    voice_preset="v2/en_speaker_6",
-    text_temp=0.7,       # Semantic temperature (0.1-1.0)
-    waveform_temp=0.7,   # Audio waveform temperature (0.1-1.0)
-    seed=42,             # Set seed for reproducibility (-1 for random)
-    api_name="/synthesize"
-)
-print(result)
-```
-### REST API (curl)
-```bash
-# XTTS2 with voice cloning
-curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "data": [
-      "Hello world",
-      "XTTS2 (Voice Cloning)",
-      {"path": "https://example.com/voice.wav"},
-      "English",
-      1.0,
-      "v2/en_speaker_6",
-      0.7,
-      0.7,
-      -1
-    ]
-  }'
-# Bark with preset voice and temperature control
-curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "data": [
-      "Hello [laughter] world",
-      "Bark (Preset Voices)",
-      null,
-      "English",
-      1.0,
-      "v2/en_speaker_3",
-      0.7,
-      0.7,
-      42
-    ]
-  }'
-```
-### MCP (Model Context Protocol)
-This Space supports MCP for AI assistants.
-**Tool schema:**
-```json
-{
-  "name": "synthesize",
-  "parameters": {
-    "text": {"type": "string", "description": "Text to synthesize"},
-    "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]},
-    "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"},
-    "language": {"type": "string", "default": "English"},
-    "speed": {"type": "number", "default": 1.0},
-    "voice_preset": {"type": "string", "default": "v2/en_speaker_6"},
-    "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"},
-    "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"},
-    "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"}
-  },
-  "returns": ["audio", "string"]
-}
-```
-**MCP Config:**
-```json
-{
-  "mcpServers": {
-    "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"}
-  }
-}
-```
----
-## CLI Usage
-```bash
-# XTTS2 voice cloning
-python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0
-# Bark preset voice (basic)
-python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6"
-# Bark with temperature control and seed
-python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \
-  --text-temp 0.7 --waveform-temp 0.7 --seed 42
-```
-## Bark Voice Presets
-| Preset | Language |
-|--------|----------|
-| `v2/en_speaker_0` - `v2/en_speaker_9` | English |
-| `v2/de_speaker_0` - `v2/de_speaker_2` | German |
-| `v2/fr_speaker_0` - `v2/fr_speaker_1` | French |
-| `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish |
-| `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese |
-| `v2/ja_speaker_0` | Japanese |
-| `v2/ko_speaker_0` | Korean |
-## Bark Temperature Guide
-| Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) |
-|---------|---------------|------------------|----------------|
-| **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable |
-| **Waveform Temp** | Cleaner audio | Natural variation | More expressive |
-**Recommended:** Start with 0.7 for both temperatures for natural-sounding speech.
----
-## Credits
-- **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0)
-- **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT)
-Licensed under Apache 2.0.

+---
+title: xtts2 + Bark TTS
+emoji: 🎙️
+colorFrom: purple
+colorTo: pink
+sdk: gradio
+sdk_version: 6.3.0
+app_file: app.py
+pinned: false
+license: apache-2.0
+tags:
+  - text-to-speech
+  - voice-cloning
+  - xtts
+  - bark
+  - mcp-server
+short_description: XTTS2 voice cloning + Bark TTS in one space
+---
+# TTS Hub: XTTS2 + Bark
+Two powerful TTS models in one space, optimized for CPU.
+## Models
+| Model | Voice Source | Languages | Special Features |
+|-------|--------------|-----------|------------------|
+| **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning |
+| **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control |
+## Usage
+### XTTS2 (Voice Cloning)
+1. Upload 3-30 seconds of reference voice audio
+2. Enter text to synthesize
+3. Select language and speed
+4. Click "Generate Speech"
+### Bark (Preset Voices)
+1. Select "Bark (Preset Voices)"
+2. Choose a voice preset (e.g., `v2/en_speaker_6`)
+3. Adjust temperature controls (optional):
+   - **Text Temperature** (0.1-1.0): Controls semantic variation
+   - **Waveform Temperature** (0.1-1.0): Controls audio variation
+4. Set seed for reproducibility (optional, -1 for random)
+5. Enter text with optional special tokens
+6. Click "Generate Speech"
+**Bark special tokens:**
+- `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]`
+- `♪ la la la ♪` for singing
+- `MAN:` `WOMAN:` for speaker labels
+**Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments.
+---
+## API
+### Python Client
+```python
+from gradio_client import Client, handle_file
+client = Client("Luminia/xtts2-Bark")
+# XTTS2 (voice cloning)
+result = client.predict(
+    text="Hello, this is a voice cloning test.",
+    model_choice="XTTS2 (Voice Cloning)",
+    reference_audio=handle_file("voice_sample.wav"),
+    language="English",
+    speed=1.0,
+    voice_preset="v2/en_speaker_6",
+    text_temp=0.7,       # Bark only (ignored for XTTS2)
+    waveform_temp=0.7,   # Bark only (ignored for XTTS2)
+    seed=-1,             # Bark only (ignored for XTTS2)
+    api_name="/synthesize"
+)
+print(result)  # (audio_path, status)
+# Bark (preset voice) with temperature control
+result = client.predict(
+    text="Hello! [laughter] This is Bark speaking.",
+    model_choice="Bark (Preset Voices)",
+    reference_audio=None,
+    language="English",
+    speed=1.0,
+    voice_preset="v2/en_speaker_6",
+    text_temp=0.7,       # Semantic temperature (0.1-1.0)
+    waveform_temp=0.7,   # Audio waveform temperature (0.1-1.0)
+    seed=42,             # Set seed for reproducibility (-1 for random)
+    api_name="/synthesize"
+)
+print(result)
+```
+### REST API (curl)
+```bash
+# XTTS2 with voice cloning
+curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": [
+      "Hello world",
+      "XTTS2 (Voice Cloning)",
+      {"path": "https://example.com/voice.wav"},
+      "English",
+      1.0,
+      "v2/en_speaker_6",
+      0.7,
+      0.7,
+      -1
+    ]
+  }'
+# Bark with preset voice and temperature control
+curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": [
+      "Hello [laughter] world",
+      "Bark (Preset Voices)",
+      null,
+      "English",
+      1.0,
+      "v2/en_speaker_3",
+      0.7,
+      0.7,
+      42
+    ]
+  }'
+```
+### MCP (Model Context Protocol)
+This Space supports MCP for AI assistants.
+**Tool schema:**
+```json
+{
+  "name": "synthesize",
+  "parameters": {
+    "text": {"type": "string", "description": "Text to synthesize"},
+    "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]},
+    "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"},
+    "language": {"type": "string", "default": "English"},
+    "speed": {"type": "number", "default": 1.0},
+    "voice_preset": {"type": "string", "default": "v2/en_speaker_6"},
+    "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"},
+    "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"},
+    "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"}
+  },
+  "returns": ["audio", "string"]
+}
+```
+**MCP Config:**
+```json
+{
+  "mcpServers": {
+    "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"}
+  }
+}
+```
+---
+## CLI Usage
+```bash
+# XTTS2 voice cloning
+python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0
+# Bark preset voice (basic)
+python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6"
+# Bark with temperature control and seed
+python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \
+  --text-temp 0.7 --waveform-temp 0.7 --seed 42
+```
+## Bark Voice Presets
+| Preset | Language |
+|--------|----------|
+| `v2/en_speaker_0` - `v2/en_speaker_9` | English |
+| `v2/de_speaker_0` - `v2/de_speaker_2` | German |
+| `v2/fr_speaker_0` - `v2/fr_speaker_1` | French |
+| `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish |
+| `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese |
+| `v2/ja_speaker_0` | Japanese |
+| `v2/ko_speaker_0` | Korean |
+## Bark Temperature Guide
+| Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) |
+|---------|---------------|------------------|----------------|
+| **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable |
+| **Waveform Temp** | Cleaner audio | Natural variation | More expressive |
+**Recommended:** Start with 0.7 for both temperatures for natural-sounding speech.
+---
+## Credits
+- **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0)
+- **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT)
+Licensed under Apache 2.0.