Spaces:

fosters
/

xttsv2

Running

App Files Files Community

xttsv2 / README.md

fosters

Upload 2 files

78ed7b9 verified 20 days ago

preview code

raw

history blame contribute delete

1.73 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: XTTSv2 Optimized TTS
emoji: 🐸
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
license: other
tags:
  - tts
  - text-to-speech
  - voice-cloning
  - xtts
  - coqui
suggested_hardware: t4-small

🐸 XTTSv2 Optimized Text-to-Speech

High-quality multilingual voice cloning powered by XTTSv2 with performance optimizations.

Features

17 Languages: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Japanese, Hungarian, Korean, Hindi
Voice Cloning: Clone any voice from ~6 seconds of reference audio
Streaming Mode: Low-latency streaming for real-time applications
Optimizations:
- DeepSpeed acceleration
- FP16 inference
- torch.compile() optimization
- Speaker embedding caching

Usage

Upload a reference audio file (WAV/MP3, 6-30 seconds recommended)
Enter your text
Select the language
Click "Generate Speech"

Performance

Hardware	Latency (per sentence)
T4	~2-3 seconds
A10G	~1 second
A100	~0.5 seconds

Configuration

Environment variables for tuning:

USE_DEEPSPEED: Enable DeepSpeed (default: true)
USE_FP16: Enable FP16 inference (default: true)
USE_TORCH_COMPILE: Enable torch.compile (default: true)
MAX_CACHE_SIZE: Number of speakers to cache (default: 10)
STREAMING_CHUNK_SIZE: Streaming chunk size (default: 20)

License

This model uses the Coqui Public Model License.