---
title: xtts2 + Bark TTS
emoji: 🎙️
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - text-to-speech
  - voice-cloning
  - xtts
  - bark
  - mcp-server
short_description: XTTS2 voice cloning + Bark TTS in one space
---

# TTS Hub: XTTS2 + Bark

Two powerful TTS models in one space, optimized for CPU.

## Models

| Model | Voice Source | Languages | Special Features |
|-------|--------------|-----------|------------------|
| **XTTS2** (default) | Your audio sample | 16 languages | Voice cloning |
| **Bark** | Preset voices | EN, DE, FR, ES, ZH, JA, KO | Non-speech sounds, temperature control |

## Usage

### XTTS2 (Voice Cloning)
1. Upload 3-30 seconds of reference voice audio
2. Enter text to synthesize
3. Select language and speed
4. Click "Generate Speech"

### Bark (Preset Voices)
1. Select "Bark (Preset Voices)"
2. Choose a voice preset (e.g., `v2/en_speaker_6`)
3. Adjust temperature controls (optional):
   - **Text Temperature** (0.1-1.0): Controls semantic variation
   - **Waveform Temperature** (0.1-1.0): Controls audio variation
4. Set seed for reproducibility (optional, -1 for random)
5. Enter text with optional special tokens
6. Click "Generate Speech"

**Bark special tokens:**
- `[laughter]` `[laughs]` `[sighs]` `[music]` `[gasps]` `[clears throat]`
- `♪ la la la ♪` for singing
- `MAN:` `WOMAN:` for speaker labels

**Long text handling:** Text is automatically split into chunks and processed sequentially with natural pauses between segments.

---

## API

### Python Client

```python
from gradio_client import Client, handle_file

client = Client("Luminia/xtts2-Bark")

# XTTS2 (voice cloning)
result = client.predict(
    text="Hello, this is a voice cloning test.",
    model_choice="XTTS2 (Voice Cloning)",
    reference_audio=handle_file("voice_sample.wav"),
    language="English",
    speed=1.0,
    voice_preset="v2/en_speaker_6",
    text_temp=0.7,       # Bark only (ignored for XTTS2)
    waveform_temp=0.7,   # Bark only (ignored for XTTS2)
    seed=-1,             # Bark only (ignored for XTTS2)
    api_name="/synthesize"
)
print(result)  # (audio_path, status)

# Bark (preset voice) with temperature control
result = client.predict(
    text="Hello! [laughter] This is Bark speaking.",
    model_choice="Bark (Preset Voices)",
    reference_audio=None,
    language="English",
    speed=1.0,
    voice_preset="v2/en_speaker_6",
    text_temp=0.7,       # Semantic temperature (0.1-1.0)
    waveform_temp=0.7,   # Audio waveform temperature (0.1-1.0)
    seed=42,             # Set seed for reproducibility (-1 for random)
    api_name="/synthesize"
)
print(result)
```

### REST API (curl)

```bash
# XTTS2 with voice cloning
curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "Hello world",
      "XTTS2 (Voice Cloning)",
      {"path": "https://example.com/voice.wav"},
      "English",
      1.0,
      "v2/en_speaker_6",
      0.7,
      0.7,
      -1
    ]
  }'

# Bark with preset voice and temperature control
curl -X POST "https://luminia-xtts2-bark.hf.space/gradio_api/call/synthesize" \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "Hello [laughter] world",
      "Bark (Preset Voices)",
      null,
      "English",
      1.0,
      "v2/en_speaker_3",
      0.7,
      0.7,
      42
    ]
  }'
```

### MCP (Model Context Protocol)

This Space supports MCP for AI assistants.

**Tool schema:**
```json
{
  "name": "synthesize",
  "parameters": {
    "text": {"type": "string", "description": "Text to synthesize"},
    "model_choice": {"type": "string", "enum": ["XTTS2 (Voice Cloning)", "Bark (Preset Voices)"]},
    "reference_audio": {"type": "file", "description": "Reference audio for XTTS2 (optional for Bark)"},
    "language": {"type": "string", "default": "English"},
    "speed": {"type": "number", "default": 1.0},
    "voice_preset": {"type": "string", "default": "v2/en_speaker_6"},
    "text_temp": {"type": "number", "default": 0.7, "description": "Bark text/semantic temperature (0.1-1.0)"},
    "waveform_temp": {"type": "number", "default": 0.7, "description": "Bark waveform temperature (0.1-1.0)"},
    "seed": {"type": "integer", "default": -1, "description": "Bark seed for reproducibility (-1 for random)"}
  },
  "returns": ["audio", "string"]
}
```

**MCP Config:**
```json
{
  "mcpServers": {
    "tts-hub": {"url": "https://luminia-xtts2-bark.hf.space/gradio_api/mcp/"}
  }
}
```

---

## CLI Usage

```bash
# XTTS2 voice cloning
python app.py tts -t "Hello world" -o output.wav -m xtts2 -r voice_sample.wav -l English -s 1.0

# Bark preset voice (basic)
python app.py tts -t "Hello [laughter] world" -o output.wav -m bark -v "v2/en_speaker_6"

# Bark with temperature control and seed
python app.py tts -t "Hello world" -o output.wav -m bark -v "v2/en_speaker_6" \
  --text-temp 0.7 --waveform-temp 0.7 --seed 42
```

## Bark Voice Presets

| Preset | Language |
|--------|----------|
| `v2/en_speaker_0` - `v2/en_speaker_9` | English |
| `v2/de_speaker_0` - `v2/de_speaker_2` | German |
| `v2/fr_speaker_0` - `v2/fr_speaker_1` | French |
| `v2/es_speaker_0` - `v2/es_speaker_1` | Spanish |
| `v2/zh_speaker_0` - `v2/zh_speaker_1` | Chinese |
| `v2/ja_speaker_0` | Japanese |
| `v2/ko_speaker_0` | Korean |

## Bark Temperature Guide

| Setting | Low (0.1-0.3) | Medium (0.5-0.7) | High (0.8-1.0) |
|---------|---------------|------------------|----------------|
| **Text Temp** | More predictable, robotic | Natural, balanced | Creative, variable |
| **Waveform Temp** | Cleaner audio | Natural variation | More expressive |

**Recommended:** Start with 0.7 for both temperatures for natural-sounding speech.

---

## Credits

- **XTTS2:** [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) (Apache 2.0)
- **Bark:** [Suno AI](https://github.com/suno-ai/bark) (MIT)

Licensed under Apache 2.0.