Spaces:
Sleeping
Sleeping
| # TTS WebSocket Service v1.0.0 | |
| Standalone WebSocket-only Text-to-Speech service for VoiceCal integration. | |
| ## Features | |
| - β WebSocket-only TTS interface (`/ws/tts`) | |
| - β ZeroGPU Bark TTS integration | |
| - β FastAPI-based architecture | |
| - β Multiple voice presets (10 speakers) | |
| - β Streaming TTS support (unmute.sh methodology) | |
| - β No Gradio dependencies | |
| - β No MCP dependencies | |
| - β Standalone deployment ready | |
| - β Base64 audio transmission | |
| - β WAV audio format output | |
| ## Quick Start | |
| ### Using the WebSocket Server | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements-websocket.txt | |
| # Run standalone WebSocket server | |
| python3 websocket_tts_server.py | |
| ``` | |
| ### Docker Deployment | |
| ```bash | |
| # Build WebSocket-only image | |
| docker build -f Dockerfile-websocket -t tts-websocket-service . | |
| # Run container | |
| docker run -p 7860:7860 tts-websocket-service | |
| ``` | |
| ## API Endpoints | |
| ### WebSocket: `/ws/tts` | |
| **Connection Confirmation:** | |
| ```json | |
| { | |
| "type": "tts_connection_confirmed", | |
| "client_id": "uuid", | |
| "service": "TTS WebSocket Service", | |
| "version": "1.0.0", | |
| "available_voices": [ | |
| "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2", | |
| "v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5", | |
| "v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8", | |
| "v2/en_speaker_9" | |
| ], | |
| "device": "cuda", | |
| "message": "TTS WebSocket connected and ready" | |
| } | |
| ``` | |
| **Single Synthesis Request:** | |
| ```json | |
| { | |
| "type": "tts_synthesize", | |
| "text": "Hello, how are you today?", | |
| "voice_preset": "v2/en_speaker_6" | |
| } | |
| ``` | |
| **Streaming Synthesis (unmute.sh methodology):** | |
| ```json | |
| { | |
| "type": "tts_streaming_text", | |
| "text_chunks": ["Hello", "how are you", "today?"], | |
| "voice_preset": "v2/en_speaker_6", | |
| "is_final": true | |
| } | |
| ``` | |
| **Synthesis Result:** | |
| ```json | |
| { | |
| "type": "tts_synthesis_complete", | |
| "client_id": "uuid", | |
| "audio_data": "base64_encoded_wav_audio", | |
| "audio_format": "wav", | |
| "text": "Hello, how are you today?", | |
| "voice_preset": "v2/en_speaker_6", | |
| "audio_size": 12345, | |
| "timing": { | |
| "processing_time": 2.34, | |
| "device": "cuda" | |
| }, | |
| "status": "success" | |
| } | |
| ``` | |
| ### HTTP: `/health` | |
| ```json | |
| { | |
| "service": "TTS WebSocket Service", | |
| "version": "1.0.0", | |
| "status": "healthy", | |
| "model_loaded": true, | |
| "active_connections": 1, | |
| "available_voices": 10, | |
| "device": "cuda" | |
| } | |
| ``` | |
| ## Port Configuration | |
| - **Default Port**: `7860` (HuggingFace Spaces standard port) | |
| - **WebSocket Endpoint**: `ws://localhost:7860/ws/tts` | |
| - **Health Check**: `http://localhost:7860/health` | |
| - **Note**: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860 | |
| ## Voice Presets | |
| Available voice presets: | |
| - `v2/en_speaker_0` - Voice 0 | |
| - `v2/en_speaker_1` - Voice 1 | |
| - `v2/en_speaker_2` - Voice 2 | |
| - `v2/en_speaker_3` - Voice 3 | |
| - `v2/en_speaker_4` - Voice 4 | |
| - `v2/en_speaker_5` - Voice 5 | |
| - `v2/en_speaker_6` - Voice 6 (default) | |
| - `v2/en_speaker_7` - Voice 7 | |
| - `v2/en_speaker_8` - Voice 8 | |
| - `v2/en_speaker_9` - Voice 9 | |
| ## Architecture | |
| This service eliminates all unnecessary dependencies: | |
| - **Removed**: Gradio web interface | |
| - **Removed**: MCP protocol support | |
| - **Removed**: Complex routing | |
| - **Added**: Direct FastAPI WebSocket endpoints | |
| - **Added**: Streaming TTS support | |
| - **Added**: ZeroGPU optimized synthesis | |
| ## Integration | |
| Connect from VoiceCal WebRTC interface: | |
| ```javascript | |
| const ws = new WebSocket('ws://localhost:7860/ws/tts'); | |
| // Send text for synthesis | |
| ws.send(JSON.stringify({ | |
| type: "tts_synthesize", | |
| text: "Hello world", | |
| voice_preset: "v2/en_speaker_6" | |
| })); | |
| // Streaming synthesis (unmute.sh pattern) | |
| ws.send(JSON.stringify({ | |
| type: "tts_streaming_text", | |
| text_chunks: ["Hello", "world"], | |
| voice_preset: "v2/en_speaker_6", | |
| is_final: true | |
| })); | |
| ``` |