# TTS WebSocket Service v1.0.0 Standalone WebSocket-only Text-to-Speech service for VoiceCal integration. ## Features - ✅ WebSocket-only TTS interface (`/ws/tts`) - ✅ ZeroGPU Bark TTS integration - ✅ FastAPI-based architecture - ✅ Multiple voice presets (10 speakers) - ✅ Streaming TTS support (unmute.sh methodology) - ✅ No Gradio dependencies - ✅ No MCP dependencies - ✅ Standalone deployment ready - ✅ Base64 audio transmission - ✅ WAV audio format output ## Quick Start ### Using the WebSocket Server ```bash # Install dependencies pip install -r requirements-websocket.txt # Run standalone WebSocket server python3 websocket_tts_server.py ``` ### Docker Deployment ```bash # Build WebSocket-only image docker build -f Dockerfile-websocket -t tts-websocket-service . # Run container docker run -p 7860:7860 tts-websocket-service ``` ## API Endpoints ### WebSocket: `/ws/tts` **Connection Confirmation:** ```json { "type": "tts_connection_confirmed", "client_id": "uuid", "service": "TTS WebSocket Service", "version": "1.0.0", "available_voices": [ "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2", "v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5", "v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8", "v2/en_speaker_9" ], "device": "cuda", "message": "TTS WebSocket connected and ready" } ``` **Single Synthesis Request:** ```json { "type": "tts_synthesize", "text": "Hello, how are you today?", "voice_preset": "v2/en_speaker_6" } ``` **Streaming Synthesis (unmute.sh methodology):** ```json { "type": "tts_streaming_text", "text_chunks": ["Hello", "how are you", "today?"], "voice_preset": "v2/en_speaker_6", "is_final": true } ``` **Synthesis Result:** ```json { "type": "tts_synthesis_complete", "client_id": "uuid", "audio_data": "base64_encoded_wav_audio", "audio_format": "wav", "text": "Hello, how are you today?", "voice_preset": "v2/en_speaker_6", "audio_size": 12345, "timing": { "processing_time": 2.34, "device": "cuda" }, "status": "success" } ``` ### HTTP: `/health` ```json { "service": "TTS WebSocket Service", "version": "1.0.0", "status": "healthy", "model_loaded": true, "active_connections": 1, "available_voices": 10, "device": "cuda" } ``` ## Port Configuration - **Default Port**: `7860` (HuggingFace Spaces standard port) - **WebSocket Endpoint**: `ws://localhost:7860/ws/tts` - **Health Check**: `http://localhost:7860/health` - **Note**: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860 ## Voice Presets Available voice presets: - `v2/en_speaker_0` - Voice 0 - `v2/en_speaker_1` - Voice 1 - `v2/en_speaker_2` - Voice 2 - `v2/en_speaker_3` - Voice 3 - `v2/en_speaker_4` - Voice 4 - `v2/en_speaker_5` - Voice 5 - `v2/en_speaker_6` - Voice 6 (default) - `v2/en_speaker_7` - Voice 7 - `v2/en_speaker_8` - Voice 8 - `v2/en_speaker_9` - Voice 9 ## Architecture This service eliminates all unnecessary dependencies: - **Removed**: Gradio web interface - **Removed**: MCP protocol support - **Removed**: Complex routing - **Added**: Direct FastAPI WebSocket endpoints - **Added**: Streaming TTS support - **Added**: ZeroGPU optimized synthesis ## Integration Connect from VoiceCal WebRTC interface: ```javascript const ws = new WebSocket('ws://localhost:7860/ws/tts'); // Send text for synthesis ws.send(JSON.stringify({ type: "tts_synthesize", text: "Hello world", voice_preset: "v2/en_speaker_6" })); // Streaming synthesis (unmute.sh pattern) ws.send(JSON.stringify({ type: "tts_streaming_text", text_chunks: ["Hello", "world"], voice_preset: "v2/en_speaker_6", is_final: true })); ```