Spaces:

pgits
/

tts-gpu-service

Sleeping

App Files Files Community

tts-gpu-service / README-websocket.md

Peter Michael Gits

feat: Add standalone WebSocket-only TTS service v1.0.0

390e1c5 7 months ago

preview code

raw

history blame contribute delete

3.73 kB

	# TTS WebSocket Service v1.0.0

	Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.

	## Features

	- ✅ WebSocket-only TTS interface (`/ws/tts`)
	- ✅ ZeroGPU Bark TTS integration
	- ✅ FastAPI-based architecture
	- ✅ Multiple voice presets (10 speakers)
	- ✅ Streaming TTS support (unmute.sh methodology)
	- ✅ No Gradio dependencies
	- ✅ No MCP dependencies
	- ✅ Standalone deployment ready
	- ✅ Base64 audio transmission
	- ✅ WAV audio format output

	## Quick Start

	### Using the WebSocket Server

	```bash
	# Install dependencies
	pip install -r requirements-websocket.txt

	# Run standalone WebSocket server
	python3 websocket_tts_server.py
	```

	### Docker Deployment

	```bash
	# Build WebSocket-only image
	docker build -f Dockerfile-websocket -t tts-websocket-service .

	# Run container
	docker run -p 7860:7860 tts-websocket-service
	```

	## API Endpoints

	### WebSocket: `/ws/tts`

	Connection Confirmation:
	```json
	{
	"type": "tts_connection_confirmed",
	"client_id": "uuid",
	"service": "TTS WebSocket Service",
	"version": "1.0.0",
	"available_voices": [
	"v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
	"v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
	"v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8",
	"v2/en_speaker_9"
	],
	"device": "cuda",
	"message": "TTS WebSocket connected and ready"
	}
	```

	Single Synthesis Request:
	```json
	{
	"type": "tts_synthesize",
	"text": "Hello, how are you today?",
	"voice_preset": "v2/en_speaker_6"
	}
	```

	Streaming Synthesis (unmute.sh methodology):
	```json
	{
	"type": "tts_streaming_text",
	"text_chunks": ["Hello", "how are you", "today?"],
	"voice_preset": "v2/en_speaker_6",
	"is_final": true
	}
	```

	Synthesis Result:
	```json
	{
	"type": "tts_synthesis_complete",
	"client_id": "uuid",
	"audio_data": "base64_encoded_wav_audio",
	"audio_format": "wav",
	"text": "Hello, how are you today?",
	"voice_preset": "v2/en_speaker_6",
	"audio_size": 12345,
	"timing": {
	"processing_time": 2.34,
	"device": "cuda"
	},
	"status": "success"
	}
	```

	### HTTP: `/health`

	```json
	{
	"service": "TTS WebSocket Service",
	"version": "1.0.0",
	"status": "healthy",
	"model_loaded": true,
	"active_connections": 1,
	"available_voices": 10,
	"device": "cuda"
	}
	```

	## Port Configuration

	- Default Port: `7860` (HuggingFace Spaces standard port)
	- WebSocket Endpoint: `ws://localhost:7860/ws/tts`
	- Health Check: `http://localhost:7860/health`
	- Note: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860

	## Voice Presets

	Available voice presets:
	- `v2/en_speaker_0` - Voice 0
	- `v2/en_speaker_1` - Voice 1
	- `v2/en_speaker_2` - Voice 2
	- `v2/en_speaker_3` - Voice 3
	- `v2/en_speaker_4` - Voice 4
	- `v2/en_speaker_5` - Voice 5
	- `v2/en_speaker_6` - Voice 6 (default)
	- `v2/en_speaker_7` - Voice 7
	- `v2/en_speaker_8` - Voice 8
	- `v2/en_speaker_9` - Voice 9

	## Architecture

	This service eliminates all unnecessary dependencies:
	- Removed: Gradio web interface
	- Removed: MCP protocol support
	- Removed: Complex routing
	- Added: Direct FastAPI WebSocket endpoints
	- Added: Streaming TTS support
	- Added: ZeroGPU optimized synthesis

	## Integration

	Connect from VoiceCal WebRTC interface:

	```javascript
	const ws = new WebSocket('ws://localhost:7860/ws/tts');

	// Send text for synthesis
	ws.send(JSON.stringify({
	type: "tts_synthesize",
	text: "Hello world",
	voice_preset: "v2/en_speaker_6"
	}));

	// Streaming synthesis (unmute.sh pattern)
	ws.send(JSON.stringify({
	type: "tts_streaming_text",
	text_chunks: ["Hello", "world"],
	voice_preset: "v2/en_speaker_6",
	is_final: true
	}));
	```