Spaces:

Loomisgitarrist
/

xtts-multilingual

Running

App Files Files Community

xtts-multilingual / XTTS_API_GUIDE.md

Loomis Green

Docs: Update API guide with PyTorch 2.6+ compatibility notes and streaming examples

f34249a about 1 month ago

preview code

raw

history blame contribute delete

2.85 kB

	# XTTS-v2 API Guide (Hugging Face Spaces)

	This API provides Text-to-Speech (TTS) capabilities using Coqui XTTS-v2, deployed on Hugging Face Spaces. It supports both streaming (low latency) and full audio generation.

	## ⚠️ Critical Setup Note (PyTorch 2.6+)

	If you are deploying this locally or on a new environment, ensure you are using a compatible PyTorch version or apply the monkeypatch for `torch.load`.
	- PyTorch 2.6+ enforces `weights_only=True` by default, which breaks loading of older Coqui TTS checkpoints.
	- Fix: Pin `torch==2.4.0` OR use the monkeypatch included in `app.py`.

	## Base URL
	`https://loomisgitarrist-xtts-multilingual.hf.space`

	---

	## 1. Streaming Endpoint (`/stream`)
	Best for: Real-time applications, chatbots, assistants.
	Method: `POST`
	URL: `/stream`

	### Request Body (JSON)
	\| Field \| Type \| Required \| Description \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| `text` \| string \| Yes \| The text to convert to speech. \|
	\| `language` \| string \| Yes \| Language code (e.g., `en`, `es`, `de`, `fr`, `it`, `pt`, `pl`, `tr`, `ru`, `nl`, `cs`, `ar`, `zh-cn`, `ja`, `hu`, `ko`). \|
	\| `speaker_id` \| string \| Yes \| Filename of the speaker WAV in `speakers/` (e.g., `dave.wav`, `robert.wav`). \|
	\| `stream_chunk_size` \| int \| No \| Chunk size for processing (default: `20`). Lower = faster start, higher = better context. \|

	### Python Example (Streaming)
	```python
	import requests

	API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/stream"
	payload = {
	"text": "Hello, I am streaming this audio directly from the API!",
	"language": "en",
	"speaker_id": "dave.wav"
	}

	print("Streaming audio...")
	response = requests.post(API_URL, data=payload, stream=True)

	with open("stream_output.wav", "wb") as f:
	for chunk in response.iter_content(chunk_size=1024):
	if chunk:
	f.write(chunk)
	print("Stream saved to stream_output.wav")
	```

	---

	## 2. Generate Endpoint (`/generate`)
	Best for: High-quality, non-real-time generation.
	Method: `POST`
	URL: `/generate`

	### Request Body (JSON)
	Same as `/stream`.

	### Python Example (Generate)
	```python
	import requests

	API_URL = "https://loomisgitarrist-xtts-multilingual.hf.space/generate"
	payload = {
	"text": "This is a full quality generation test.",
	"language": "en",
	"speaker_id": "dave.wav"
	}

	response = requests.post(API_URL, data=payload)

	if response.status_code == 200:
	with open("output.wav", "wb") as f:
	f.write(response.content)
	print("Audio saved to output.wav")
	else:
	print("Error:", response.text)
	```

	---

	## Troubleshooting
	- 503 Model not loaded: The Space is starting up (Cold Start). Wait 1-2 minutes.
	- Empty Audio (44 bytes): Usually indicates a streaming error on the server. Check logs.
	- Connection Error: Check your internet or if the Space is paused.