tts-gpu-service / README-websocket.md
Peter Michael Gits
feat: Add standalone WebSocket-only TTS service v1.0.0
390e1c5
# TTS WebSocket Service v1.0.0
Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.
## Features
- βœ… WebSocket-only TTS interface (`/ws/tts`)
- βœ… ZeroGPU Bark TTS integration
- βœ… FastAPI-based architecture
- βœ… Multiple voice presets (10 speakers)
- βœ… Streaming TTS support (unmute.sh methodology)
- βœ… No Gradio dependencies
- βœ… No MCP dependencies
- βœ… Standalone deployment ready
- βœ… Base64 audio transmission
- βœ… WAV audio format output
## Quick Start
### Using the WebSocket Server
```bash
# Install dependencies
pip install -r requirements-websocket.txt
# Run standalone WebSocket server
python3 websocket_tts_server.py
```
### Docker Deployment
```bash
# Build WebSocket-only image
docker build -f Dockerfile-websocket -t tts-websocket-service .
# Run container
docker run -p 7860:7860 tts-websocket-service
```
## API Endpoints
### WebSocket: `/ws/tts`
**Connection Confirmation:**
```json
{
"type": "tts_connection_confirmed",
"client_id": "uuid",
"service": "TTS WebSocket Service",
"version": "1.0.0",
"available_voices": [
"v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
"v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
"v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8",
"v2/en_speaker_9"
],
"device": "cuda",
"message": "TTS WebSocket connected and ready"
}
```
**Single Synthesis Request:**
```json
{
"type": "tts_synthesize",
"text": "Hello, how are you today?",
"voice_preset": "v2/en_speaker_6"
}
```
**Streaming Synthesis (unmute.sh methodology):**
```json
{
"type": "tts_streaming_text",
"text_chunks": ["Hello", "how are you", "today?"],
"voice_preset": "v2/en_speaker_6",
"is_final": true
}
```
**Synthesis Result:**
```json
{
"type": "tts_synthesis_complete",
"client_id": "uuid",
"audio_data": "base64_encoded_wav_audio",
"audio_format": "wav",
"text": "Hello, how are you today?",
"voice_preset": "v2/en_speaker_6",
"audio_size": 12345,
"timing": {
"processing_time": 2.34,
"device": "cuda"
},
"status": "success"
}
```
### HTTP: `/health`
```json
{
"service": "TTS WebSocket Service",
"version": "1.0.0",
"status": "healthy",
"model_loaded": true,
"active_connections": 1,
"available_voices": 10,
"device": "cuda"
}
```
## Port Configuration
- **Default Port**: `7860` (HuggingFace Spaces standard port)
- **WebSocket Endpoint**: `ws://localhost:7860/ws/tts`
- **Health Check**: `http://localhost:7860/health`
- **Note**: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860
## Voice Presets
Available voice presets:
- `v2/en_speaker_0` - Voice 0
- `v2/en_speaker_1` - Voice 1
- `v2/en_speaker_2` - Voice 2
- `v2/en_speaker_3` - Voice 3
- `v2/en_speaker_4` - Voice 4
- `v2/en_speaker_5` - Voice 5
- `v2/en_speaker_6` - Voice 6 (default)
- `v2/en_speaker_7` - Voice 7
- `v2/en_speaker_8` - Voice 8
- `v2/en_speaker_9` - Voice 9
## Architecture
This service eliminates all unnecessary dependencies:
- **Removed**: Gradio web interface
- **Removed**: MCP protocol support
- **Removed**: Complex routing
- **Added**: Direct FastAPI WebSocket endpoints
- **Added**: Streaming TTS support
- **Added**: ZeroGPU optimized synthesis
## Integration
Connect from VoiceCal WebRTC interface:
```javascript
const ws = new WebSocket('ws://localhost:7860/ws/tts');
// Send text for synthesis
ws.send(JSON.stringify({
type: "tts_synthesize",
text: "Hello world",
voice_preset: "v2/en_speaker_6"
}));
// Streaming synthesis (unmute.sh pattern)
ws.send(JSON.stringify({
type: "tts_streaming_text",
text_chunks: ["Hello", "world"],
voice_preset: "v2/en_speaker_6",
is_final: true
}));
```