---
license: apache-2.0
title: Forge-TTS
sdk: docker
emoji: 🏃
colorFrom: yellow
colorTo: indigo
short_description: TTS
---
# HF Spaces CPU TTS API (Docker)

API-only Space with **separate endpoints** for:

- **XTTS v2** (voice cloning with an uploaded reference clip; Polish by default)
- **Parler-TTS mini multilingual v1.1** (fast, high-quality Polish TTS; style controlled by text description)
- **Piper** (backup, local voices; bring your own `.onnx` voice files)

Runs on **HF Spaces free CPU (2 vCPU / 16GB RAM)** with CPU-friendly defaults:
- **Chunking** (sentence-based) to avoid timeouts on long text
- **Streaming** via SSE (each chunk returned as a standalone WAV)
- Optional **torch.compile** and optional **dynamic int8 quantization** hooks

---

## Endpoints

### Health
- `GET /health`

### XTTS v2
- `POST /v1/xtts/synthesize` (multipart/form-data; WAV bytes)
- `POST /v1/xtts/stream` (SSE; base64 WAV chunks)

### Parler
- `POST /v1/parler/synthesize` (JSON; WAV bytes)
- `POST /v1/parler/stream` (JSON; SSE base64 WAV chunks)

### Piper
- `GET /v1/piper/voices`
- `POST /v1/piper/synthesize` (JSON; WAV bytes)

OpenAPI docs:
- `/docs`

---

## Usage examples

### XTTS voice cloning (file upload)
```bash
curl -X POST "http://localhost:7860/v1/xtts/synthesize" \
  -F "text=Cześć! To jest test głosu." \
  -F "language=pl" \
  -F "chunking=true" \
  -F "speaker_wav=@reference.wav" \
  --output out.wav
```

### XTTS streaming (SSE)
This streams **multiple WAV chunks** (base64) as events. Your client should decode each `wav_b64` and play/append.
```bash
curl -N -X POST "http://localhost:7860/v1/xtts/stream" \
  -H "Content-Type: application/json" \
  -d '{"text":"Cześć! To jest dłuższy tekst. Druga fraza. Trzecia fraza.","language":"pl","chunking":true}'
```

### Parler synth
```bash
curl -X POST "http://localhost:7860/v1/parler/synthesize" \
  -H "Content-Type: application/json" \
  -d '{
    "text":"Cześć! To Parler w języku polskim.",
    "description":"A calm female Polish voice, close-mic, warm tone, subtle smile, studio quality."
  }' \
  --output parler.wav
```

### Piper voices + synth
```bash
curl "http://localhost:7860/v1/piper/voices"

curl -X POST "http://localhost:7860/v1/piper/synthesize" \
  -H "Content-Type: application/json" \
  -d '{"text":"To jest Piper jako kopia zapasowa.","voice_id":"pl_PL-gosia-medium"}' \
  --output piper.wav
```

---

## Environment variables (important knobs)

### XTTS
- `XTTS_MODEL_NAME` (default: `tts_models/multilingual/multi-dataset/xtts_v2`)
- `XTTS_DEFAULT_LANGUAGE` (default: `pl`)
- `XTTS_TORCH_COMPILE=1` to attempt `torch.compile()` (best-effort)
- `XTTS_DYNAMIC_INT8=1` to attempt dynamic int8 quantization (best-effort)

### Parler
- `PARLER_MODEL_NAME` (default: `parler-tts/parler-tts-mini-multilingual-v1.1`)
- `PARLER_DEFAULT_DESCRIPTION` (default is neutral Polish)
- `PARLER_SEED` (default: `0`)
- `PARLER_TORCH_COMPILE=1` (best-effort)
- `PARLER_DYNAMIC_INT8=1` (best-effort)

### Chunking / joining
- `CHUNK_MAX_CHARS` (default: 260)
- `CHUNK_MAX_WORDS` (default: 40)
- `CHUNK_MAX_SENTENCES` (default: 8)
- `JOIN_SILENCE_MS` (default: 60)

### Piper
Bring your own Piper `.onnx` voices:
- Put voice files in `/data/piper` (auto-scanned) **OR**
- Set `PIPER_VOICES_JSON='{"voice_id":"/data/piper/voice.onnx"}'`
- Optionally set `PIPER_VOICES_DIR` (default: `/data/piper`)

---

## Notes on “streaming”
XTTS and Parler streaming here is implemented by:
1) **Sentence chunking** (fast + stable on CPU)
2) Returning each chunk as its own **WAV** event over SSE

This avoids needing the full WAV length upfront and prevents long-run timeouts on free Spaces CPU.