xtts-multilingual / XTTS_API_GUIDE.md
Loomis Green
Docs: Update API guide with PyTorch 2.6+ compatibility notes and streaming examples
f34249a
# XTTS-v2 API Guide (Hugging Face Spaces)
This API provides Text-to-Speech (TTS) capabilities using Coqui XTTS-v2, deployed on Hugging Face Spaces. It supports both streaming (low latency) and full audio generation.
## ⚠️ Critical Setup Note (PyTorch 2.6+)
If you are deploying this locally or on a new environment, ensure you are using a compatible PyTorch version or apply the monkeypatch for `torch.load`.
- **PyTorch 2.6+** enforces `weights_only=True` by default, which breaks loading of older Coqui TTS checkpoints.
- **Fix:** Pin `torch==2.4.0` OR use the monkeypatch included in `app.py`.
## Base URL
`https://loomisgitarrist-xtts-multilingual.hf.space`
---
## 1. Streaming Endpoint (`/stream`)
**Best for:** Real-time applications, chatbots, assistants.
**Method:** `POST`
**URL:** `/stream`
### Request Body (JSON)
| Field | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| `text` | string | Yes | The text to convert to speech. |
| `language` | string | Yes | Language code (e.g., `en`, `es`, `de`, `fr`, `it`, `pt`, `pl`, `tr`, `ru`, `nl`, `cs`, `ar`, `zh-cn`, `ja`, `hu`, `ko`). |
| `speaker_id` | string | Yes | Filename of the speaker WAV in `speakers/` (e.g., `dave.wav`, `robert.wav`). |
| `stream_chunk_size` | int | No | Chunk size for processing (default: `20`). Lower = faster start, higher = better context. |
### Python Example (Streaming)
```python
import requests
API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/stream"
payload = {
"text": "Hello, I am streaming this audio directly from the API!",
"language": "en",
"speaker_id": "dave.wav"
}
print("Streaming audio...")
response = requests.post(API_URL, data=payload, stream=True)
with open("stream_output.wav", "wb") as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
print("Stream saved to stream_output.wav")
```
---
## 2. Generate Endpoint (`/generate`)
**Best for:** High-quality, non-real-time generation.
**Method:** `POST`
**URL:** `/generate`
### Request Body (JSON)
Same as `/stream`.
### Python Example (Generate)
```python
import requests
API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/generate"
payload = {
"text": "This is a full quality generation test.",
"language": "en",
"speaker_id": "dave.wav"
}
response = requests.post(API_URL, data=payload)
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")
else:
print("Error:", response.text)
```
---
## Troubleshooting
- **503 Model not loaded:** The Space is starting up (Cold Start). Wait 1-2 minutes.
- **Empty Audio (44 bytes):** Usually indicates a streaming error on the server. Check logs.
- **Connection Error:** Check your internet or if the Space is paused.