# TTS WebSocket Service v1.0.0

Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.

## Features

- ✅ WebSocket-only TTS interface (`/ws/tts`)
- ✅ ZeroGPU Bark TTS integration
- ✅ FastAPI-based architecture
- ✅ Multiple voice presets (10 speakers)
- ✅ Streaming TTS support (unmute.sh methodology)
- ✅ No Gradio dependencies
- ✅ No MCP dependencies
- ✅ Standalone deployment ready
- ✅ Base64 audio transmission
- ✅ WAV audio format output

## Quick Start

### Using the WebSocket Server

```bash
# Install dependencies
pip install -r requirements-websocket.txt

# Run standalone WebSocket server
python3 websocket_tts_server.py
```

### Docker Deployment

```bash
# Build WebSocket-only image
docker build -f Dockerfile-websocket -t tts-websocket-service .

# Run container
docker run -p 7860:7860 tts-websocket-service
```

## API Endpoints

### WebSocket: `/ws/tts`

**Connection Confirmation:**
```json
{
  "type": "tts_connection_confirmed",
  "client_id": "uuid",
  "service": "TTS WebSocket Service", 
  "version": "1.0.0",
  "available_voices": [
    "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
    "v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
    "v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8", 
    "v2/en_speaker_9"
  ],
  "device": "cuda",
  "message": "TTS WebSocket connected and ready"
}
```

**Single Synthesis Request:**
```json
{
  "type": "tts_synthesize",
  "text": "Hello, how are you today?",
  "voice_preset": "v2/en_speaker_6"
}
```

**Streaming Synthesis (unmute.sh methodology):**
```json
{
  "type": "tts_streaming_text",
  "text_chunks": ["Hello", "how are you", "today?"],
  "voice_preset": "v2/en_speaker_6",
  "is_final": true
}
```

**Synthesis Result:**
```json
{
  "type": "tts_synthesis_complete",
  "client_id": "uuid",
  "audio_data": "base64_encoded_wav_audio",
  "audio_format": "wav",
  "text": "Hello, how are you today?",
  "voice_preset": "v2/en_speaker_6",
  "audio_size": 12345,
  "timing": {
    "processing_time": 2.34,
    "device": "cuda"
  },
  "status": "success"
}
```

### HTTP: `/health`

```json
{
  "service": "TTS WebSocket Service",
  "version": "1.0.0", 
  "status": "healthy",
  "model_loaded": true,
  "active_connections": 1,
  "available_voices": 10,
  "device": "cuda"
}
```

## Port Configuration

- **Default Port**: `7860` (HuggingFace Spaces standard port)
- **WebSocket Endpoint**: `ws://localhost:7860/ws/tts`
- **Health Check**: `http://localhost:7860/health`
- **Note**: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860

## Voice Presets

Available voice presets:
- `v2/en_speaker_0` - Voice 0
- `v2/en_speaker_1` - Voice 1  
- `v2/en_speaker_2` - Voice 2
- `v2/en_speaker_3` - Voice 3
- `v2/en_speaker_4` - Voice 4
- `v2/en_speaker_5` - Voice 5
- `v2/en_speaker_6` - Voice 6 (default)
- `v2/en_speaker_7` - Voice 7
- `v2/en_speaker_8` - Voice 8
- `v2/en_speaker_9` - Voice 9

## Architecture  

This service eliminates all unnecessary dependencies:
- **Removed**: Gradio web interface
- **Removed**: MCP protocol support
- **Removed**: Complex routing
- **Added**: Direct FastAPI WebSocket endpoints  
- **Added**: Streaming TTS support
- **Added**: ZeroGPU optimized synthesis

## Integration

Connect from VoiceCal WebRTC interface:

```javascript
const ws = new WebSocket('ws://localhost:7860/ws/tts');

// Send text for synthesis
ws.send(JSON.stringify({
  type: "tts_synthesize", 
  text: "Hello world",
  voice_preset: "v2/en_speaker_6"
}));

// Streaming synthesis (unmute.sh pattern)
ws.send(JSON.stringify({
  type: "tts_streaming_text",
  text_chunks: ["Hello", "world"],
  voice_preset: "v2/en_speaker_6",
  is_final: true
}));
```