Spaces:

fosters
/

xttsv2

Sleeping

App Files Files Community

fosters commited on Dec 17, 2025

Commit

c166d92

verified ·

1 Parent(s): c0c33ea

Upload 3 files

Browse files

Files changed (3) hide show

Dockerfile +48 -0
README.md +61 -6
requirements.txt +25 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,48 @@

+FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+ENV USE_DEEPSPEED=true
+ENV USE_FP16=true
+ENV USE_TORCH_COMPILE=true
+ENV MAX_CACHE_SIZE=10
+ENV HF_HOME=/app/cache
+ENV TRANSFORMERS_CACHE=/app/cache
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    ffmpeg \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+# Create app directory
+WORKDIR /app
+# Copy requirements first for caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Install DeepSpeed with CUDA
+RUN pip install --no-cache-dir deepspeed
+# Copy application code
+COPY app.py .
+COPY model/ ./model/
+# Create cache directory
+RUN mkdir -p /app/cache
+# Set permissions for HF Spaces
+RUN chmod -R 777 /app
+# Expose port
+EXPOSE 7860
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,12 +1,67 @@
 ---
-title: Xttsv2
-emoji: 👁
 colorFrom: green
-colorTo: gray
-sdk: gradio
-sdk_version: 6.1.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: XTTSv2 Optimized TTS
+emoji: 🐸
 colorFrom: green
+colorTo: blue
+sdk: docker
+sdk_version: 4.19.0
 app_file: app.py
 pinned: false
+license: other
+tags:
+  - tts
+  - text-to-speech
+  - voice-cloning
+  - xtts
+  - coqui
+suggested_hardware: t4-small
 ---
+# 🐸 XTTSv2 Optimized Text-to-Speech
+High-quality multilingual voice cloning powered by XTTSv2 with performance optimizations.
+## Features
+- **17 Languages**: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Japanese, Hungarian, Korean, Hindi
+- **Voice Cloning**: Clone any voice from ~6 seconds of reference audio
+- **Streaming Mode**: Low-latency streaming for real-time applications
+- **Optimizations**:
+  - DeepSpeed acceleration
+  - FP16 inference
+  - torch.compile() optimization
+  - Speaker embedding caching
+## Usage
+1. Upload a reference audio file (WAV/MP3, 6-30 seconds recommended)
+2. Enter your text
+3. Select the language
+4. Click "Generate Speech"
+## Performance
+| Hardware | Latency (per sentence) |
+|----------|------------------------|
+| T4       | ~2-3 seconds           |
+| A10G     | ~1 second              |
+| A100     | ~0.5 seconds           |
+## Configuration
+Environment variables for tuning:
+- `USE_DEEPSPEED`: Enable DeepSpeed (default: true)
+- `USE_FP16`: Enable FP16 inference (default: true)
+- `USE_TORCH_COMPILE`: Enable torch.compile (default: true)
+- `MAX_CACHE_SIZE`: Number of speakers to cache (default: 10)
+- `STREAMING_CHUNK_SIZE`: Streaming chunk size (default: 20)
+## License
+This model uses the [Coqui Public Model License](https://coqui.ai/cpml).
+## Credits
+- [Coqui TTS](https://github.com/coqui-ai/TTS)
+- [XTTS Paper](https://arxiv.org/abs/2406.04904)

requirements.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+# Core TTS
+TTS>=0.22.0
+# PyTorch with CUDA (HF Spaces handles this, but specify for local dev)
+torch>=2.1.0
+torchaudio>=2.1.0
+# DeepSpeed for acceleration
+deepspeed>=0.12.0
+# Gradio UI
+gradio>=4.19.0
+# Audio processing
+numpy>=1.24.0
+scipy>=1.11.0
+librosa>=0.10.0
+soundfile>=0.12.0
+# Utilities
+transformers>=4.36.0
+huggingface_hub>=0.20.0
+# Optional: for faster tokenization
+tokenizers>=0.15.0