--- title: xtts2 + Bark TTS emoji: 🎙️ colorFrom: purple colorTo: pink sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false license: apache-2.0 tags: - text-to-speech - voice-cloning - xtts - bark - mcp-server short_description: XTTS2 voice cloning + Bark TTS in one space --- # TTS Hub: XTTS2 + Bark Two powerful TTS models in one space, optimized for CPU. ## Models | Model | Voice Source | Languages | Special Features | |-------|--------------|-----------|------------------| | **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning | | **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control | ## Usage ### XTTS2 (Voice Cloning) 1. Upload 3-30 seconds of reference voice audio 2. Enter text to synthesize 3. Select language and speed 4. Click "Generate Speech" ### Bark (Preset Voices) 1. Select "Bark (Preset Voices)" 2. Choose a voice preset (e.g., `v2/en_speaker_6`) 3. Adjust temperature controls (optional): - **Text Temperature** (0.1-1.0): Controls semantic variation - **Waveform Temperature** (0.1-1.0): Controls audio variation 4. Set seed for reproducibility (optional, -1 for random) 5. Enter text with optional special tokens 6. Click "Generate Speech" **Bark special tokens:** - `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]` - `♪ la la la ♪` for singing - `MAN:` `WOMAN:` for speaker labels **Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments. --- ## API ### Python Client ```python from gradio_client import Client, handle_file client = Client("Luminia/xtts2-Bark") # XTTS2 (voice cloning) result = client.predict( text="Hello, this is a voice cloning test.", model_choice="XTTS2 (Voice Cloning)", reference_audio=handle_file("voice_sample.wav"), language="English", speed=1.0, voice_preset="v2/en_speaker_6", text_temp=0.7, # Bark only (ignored for XTTS2) waveform_temp=0.7, # Bark only (ignored for XTTS2) seed=-1, # Bark only (ignored for XTTS2) api_name="/synthesize" ) print(result) # (audio_path, status) # Bark (preset voice) with temperature control result = client.predict( text="Hello! [laughter] This is Bark speaking.", model_choice="Bark (Preset Voices)", reference_audio=None, language="English", speed=1.0, voice_preset="v2/en_speaker_6", text_temp=0.7, # Semantic temperature (0.1-1.0) waveform_temp=0.7, # Audio waveform temperature (0.1-1.0) seed=42, # Set seed for reproducibility (-1 for random) api_name="/synthesize" ) print(result) ``` ### REST API (curl) ```bash # XTTS2 with voice cloning curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \ -H "Content-Type: application/json" \ -d '{ "data": [ "Hello world", "XTTS2 (Voice Cloning)", {"path": "https://example.com/voice.wav"}, "English", 1.0, "v2/en_speaker_6", 0.7, 0.7, -1 ] }' # Bark with preset voice and temperature control curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \ -H "Content-Type: application/json" \ -d '{ "data": [ "Hello [laughter] world", "Bark (Preset Voices)", null, "English", 1.0, "v2/en_speaker_3", 0.7, 0.7, 42 ] }' ``` ### MCP (Model Context Protocol) This Space supports MCP for AI assistants. **Tool schema:** ```json { "name": "synthesize", "parameters": { "text": {"type": "string", "description": "Text to synthesize"}, "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]}, "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"}, "language": {"type": "string", "default": "English"}, "speed": {"type": "number", "default": 1.0}, "voice_preset": {"type": "string", "default": "v2/en_speaker_6"}, "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"}, "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"}, "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"} }, "returns": ["audio", "string"] } ``` **MCP Config:** ```json { "mcpServers": { "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"} } } ``` --- ## CLI Usage ```bash # XTTS2 voice cloning python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0 # Bark preset voice (basic) python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6" # Bark with temperature control and seed python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \ --text-temp 0.7 --waveform-temp 0.7 --seed 42 ``` ## Bark Voice Presets | Preset | Language | |--------|----------| | `v2/en_speaker_0` - `v2/en_speaker_9` | English | | `v2/de_speaker_0` - `v2/de_speaker_2` | German | | `v2/fr_speaker_0` - `v2/fr_speaker_1` | French | | `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish | | `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese | | `v2/ja_speaker_0` | Japanese | | `v2/ko_speaker_0` | Korean | ## Bark Temperature Guide | Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) | |---------|---------------|------------------|----------------| | **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable | | **Waveform Temp** | Cleaner audio | Natural variation | More expressive | **Recommended:** Start with 0.7 for both temperatures for natural-sounding speech. --- ## Credits - **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0) - **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT) Licensed under Apache 2.0.