xtts-multilingual / XTTS_API_GUIDE.md
Loomis Green
Docs: Update API guide with PyTorch 2.6+ compatibility notes and streaming examples
f34249a

XTTS-v2 API Guide (Hugging Face Spaces)

This API provides Text-to-Speech (TTS) capabilities using Coqui XTTS-v2, deployed on Hugging Face Spaces. It supports both streaming (low latency) and full audio generation.

⚠️ Critical Setup Note (PyTorch 2.6+)

If you are deploying this locally or on a new environment, ensure you are using a compatible PyTorch version or apply the monkeypatch for torch.load.

  • PyTorch 2.6+ enforces weights_only=True by default, which breaks loading of older Coqui TTS checkpoints.
  • Fix: Pin torch==2.4.0 OR use the monkeypatch included in app.py.

Base URL

https://loomisgitarrist-xtts-multilingual.hf.space


1. Streaming Endpoint (/stream)

Best for: Real-time applications, chatbots, assistants. Method: POST URL: /stream

Request Body (JSON)

Field Type Required Description
text string Yes The text to convert to speech.
language string Yes Language code (e.g., en, es, de, fr, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko).
speaker_id string Yes Filename of the speaker WAV in speakers/ (e.g., dave.wav, robert.wav).
stream_chunk_size int No Chunk size for processing (default: 20). Lower = faster start, higher = better context.

Python Example (Streaming)

import requests

API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/stream"
payload = {
    "text": "Hello, I am streaming this audio directly from the API!",
    "language": "en",
    "speaker_id": "dave.wav"
}

print("Streaming audio...")
response = requests.post(API_URL, data=payload, stream=True)

with open("stream_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
print("Stream saved to stream_output.wav")

2. Generate Endpoint (/generate)

Best for: High-quality, non-real-time generation. Method: POST URL: /generate

Request Body (JSON)

Same as /stream.

Python Example (Generate)

import requests

API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/generate"
payload = {
    "text": "This is a full quality generation test.",
    "language": "en",
    "speaker_id": "dave.wav"
}

response = requests.post(API_URL, data=payload)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio saved to output.wav")
else:
    print("Error:", response.text)

Troubleshooting

  • 503 Model not loaded: The Space is starting up (Cold Start). Wait 1-2 minutes.
  • Empty Audio (44 bytes): Usually indicates a streaming error on the server. Check logs.
  • Connection Error: Check your internet or if the Space is paused.