Loading model…
Input Text 0 / 5000
Quick Examples
Voice
Loading voices…
Settings
Speed 1.0×
Format
Output
Generate audio to see output
0:00 0:00
📋 Swagger UI
POST /tts

Synthesize text to audio. Returns raw audio bytes.

// Request body
{
  "text": "Hello world!",
  "voice": "af_heart",
  "speed": 1.0,
  "output_format": "wav"
}

// Response: audio/wav or audio/mpeg stream
// Headers:
// X-Duration-Seconds: 3.45
// Content-Disposition: attachment; filename="kokoro_af_heart.wav"
GET /voices

List all available voices.

// Response
{
  "voices": {
    "af_heart": {
      "label": "Heart",
      "lang": "en-US",
      "gender": "female",
      "flag": "🇺🇸"
    },
    ...
  },
  "total": 42
}
GET /health

Model and device status.

// Response
{
  "status": "ok",
  "model_loaded": true,
  "device": "cuda",
  "cuda": true,
  "pipelines": ["a","b","e","f","h","i","j","p","z"]
}
Quick Start (Python)
# pip install requests
import requests

resp = requests.post("http://localhost:7860/tts", json={
    "text": "Hello from Kokoro TTS!",
    "voice": "af_heart",
    "speed": 1.0,
    "output_format": "wav"
})

with open("output.wav", "wb") as f:
    f.write(resp.content)

duration = resp.headers.get("X-Duration-Seconds")
print(f"Duration: {duration}s")