Spaces:

pgits
/

tts-gpu-service

Sleeping

App Files Files Community

tts-gpu-service / README-websocket.md

Peter Michael Gits

feat: Add standalone WebSocket-only TTS service v1.0.0

390e1c5 7 months ago

preview code

raw

history blame contribute delete

3.73 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

TTS WebSocket Service v1.0.0

Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.

Features

✅ WebSocket-only TTS interface (/ws/tts)
✅ ZeroGPU Bark TTS integration
✅ FastAPI-based architecture
✅ Multiple voice presets (10 speakers)
✅ Streaming TTS support (unmute.sh methodology)
✅ No Gradio dependencies
✅ No MCP dependencies
✅ Standalone deployment ready
✅ Base64 audio transmission
✅ WAV audio format output

Quick Start

Using the WebSocket Server

# Install dependencies
pip install -r requirements-websocket.txt

# Run standalone WebSocket server
python3 websocket_tts_server.py

Docker Deployment

# Build WebSocket-only image
docker build -f Dockerfile-websocket -t tts-websocket-service .

# Run container
docker run -p 7860:7860 tts-websocket-service

API Endpoints

WebSocket: `/ws/tts`

Connection Confirmation:

{
  "type": "tts_connection_confirmed",
  "client_id": "uuid",
  "service": "TTS WebSocket Service", 
  "version": "1.0.0",
  "available_voices": [
    "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
    "v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
    "v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8", 
    "v2/en_speaker_9"
  ],
  "device": "cuda",
  "message": "TTS WebSocket connected and ready"
}

Single Synthesis Request:

{
  "type": "tts_synthesize",
  "text": "Hello, how are you today?",
  "voice_preset": "v2/en_speaker_6"
}

Streaming Synthesis (unmute.sh methodology):

{
  "type": "tts_streaming_text",
  "text_chunks": ["Hello", "how are you", "today?"],
  "voice_preset": "v2/en_speaker_6",
  "is_final": true
}

Synthesis Result:

{
  "type": "tts_synthesis_complete",
  "client_id": "uuid",
  "audio_data": "base64_encoded_wav_audio",
  "audio_format": "wav",
  "text": "Hello, how are you today?",
  "voice_preset": "v2/en_speaker_6",
  "audio_size": 12345,
  "timing": {
    "processing_time": 2.34,
    "device": "cuda"
  },
  "status": "success"
}

HTTP: `/health`

{
  "service": "TTS WebSocket Service",
  "version": "1.0.0", 
  "status": "healthy",
  "model_loaded": true,
  "active_connections": 1,
  "available_voices": 10,
  "device": "cuda"
}

Port Configuration

Default Port: 7860 (HuggingFace Spaces standard port)
WebSocket Endpoint: ws://localhost:7860/ws/tts
Health Check: http://localhost:7860/health
Note: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860

Voice Presets

Available voice presets:

v2/en_speaker_0 - Voice 0
v2/en_speaker_1 - Voice 1
v2/en_speaker_2 - Voice 2
v2/en_speaker_3 - Voice 3
v2/en_speaker_4 - Voice 4
v2/en_speaker_5 - Voice 5
v2/en_speaker_6 - Voice 6 (default)
v2/en_speaker_7 - Voice 7
v2/en_speaker_8 - Voice 8
v2/en_speaker_9 - Voice 9

Architecture

This service eliminates all unnecessary dependencies:

Removed: Gradio web interface
Removed: MCP protocol support
Removed: Complex routing
Added: Direct FastAPI WebSocket endpoints
Added: Streaming TTS support
Added: ZeroGPU optimized synthesis

Integration

Connect from VoiceCal WebRTC interface:

const ws = new WebSocket('ws://localhost:7860/ws/tts');

// Send text for synthesis
ws.send(JSON.stringify({
  type: "tts_synthesize", 
  text: "Hello world",
  voice_preset: "v2/en_speaker_6"
}));

// Streaming synthesis (unmute.sh pattern)
ws.send(JSON.stringify({
  type: "tts_streaming_text",
  text_chunks: ["Hello", "world"],
  voice_preset: "v2/en_speaker_6",
  is_final: true
}));