Peter Michael Gits Claude commited on
Commit
390e1c5
Β·
1 Parent(s): 072c9ef

feat: Add standalone WebSocket-only TTS service v1.0.0

Browse files

- WebSocket-only interface at /ws/tts
- ZeroGPU Bark TTS integration
- FastAPI-based architecture
- 10 voice presets available
- Streaming TTS with unmute.sh methodology
- No Gradio/MCP dependencies
- Standalone deployment ready
- Port 7860 (HuggingFace Spaces standard)

πŸ€– Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Dockerfile-websocket ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Minimal Dockerfile for WebSocket-only TTS service
2
+ FROM python:3.11-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Install minimal system packages
8
+ RUN apt-get update && apt-get install -y --no-install-recommends \
9
+ curl \
10
+ && rm -rf /var/lib/apt/lists/* \
11
+ && apt-get clean
12
+
13
+ # Create non-root user
14
+ RUN useradd -m -u 1000 user
15
+
16
+ # Switch to user
17
+ USER user
18
+ ENV HOME=/home/user \
19
+ PATH=/home/user/.local/bin:$PATH
20
+
21
+ WORKDIR $HOME/app
22
+
23
+ # Copy and install minimal requirements
24
+ COPY --chown=user requirements-websocket.txt .
25
+ RUN pip install --user --no-cache-dir -r requirements-websocket.txt
26
+
27
+ # Copy WebSocket server
28
+ COPY --chown=user websocket_tts_server.py .
29
+
30
+ # Expose port
31
+ EXPOSE 7860
32
+
33
+ # Environment variables
34
+ ENV GRADIO_SERVER_NAME="0.0.0.0" \
35
+ GRADIO_SERVER_PORT=7860
36
+
37
+ # Run WebSocket-only TTS service
38
+ CMD ["python3", "websocket_tts_server.py"]
README-websocket.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TTS WebSocket Service v1.0.0
2
+
3
+ Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.
4
+
5
+ ## Features
6
+
7
+ - βœ… WebSocket-only TTS interface (`/ws/tts`)
8
+ - βœ… ZeroGPU Bark TTS integration
9
+ - βœ… FastAPI-based architecture
10
+ - βœ… Multiple voice presets (10 speakers)
11
+ - βœ… Streaming TTS support (unmute.sh methodology)
12
+ - βœ… No Gradio dependencies
13
+ - βœ… No MCP dependencies
14
+ - βœ… Standalone deployment ready
15
+ - βœ… Base64 audio transmission
16
+ - βœ… WAV audio format output
17
+
18
+ ## Quick Start
19
+
20
+ ### Using the WebSocket Server
21
+
22
+ ```bash
23
+ # Install dependencies
24
+ pip install -r requirements-websocket.txt
25
+
26
+ # Run standalone WebSocket server
27
+ python3 websocket_tts_server.py
28
+ ```
29
+
30
+ ### Docker Deployment
31
+
32
+ ```bash
33
+ # Build WebSocket-only image
34
+ docker build -f Dockerfile-websocket -t tts-websocket-service .
35
+
36
+ # Run container
37
+ docker run -p 7860:7860 tts-websocket-service
38
+ ```
39
+
40
+ ## API Endpoints
41
+
42
+ ### WebSocket: `/ws/tts`
43
+
44
+ **Connection Confirmation:**
45
+ ```json
46
+ {
47
+ "type": "tts_connection_confirmed",
48
+ "client_id": "uuid",
49
+ "service": "TTS WebSocket Service",
50
+ "version": "1.0.0",
51
+ "available_voices": [
52
+ "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
53
+ "v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
54
+ "v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8",
55
+ "v2/en_speaker_9"
56
+ ],
57
+ "device": "cuda",
58
+ "message": "TTS WebSocket connected and ready"
59
+ }
60
+ ```
61
+
62
+ **Single Synthesis Request:**
63
+ ```json
64
+ {
65
+ "type": "tts_synthesize",
66
+ "text": "Hello, how are you today?",
67
+ "voice_preset": "v2/en_speaker_6"
68
+ }
69
+ ```
70
+
71
+ **Streaming Synthesis (unmute.sh methodology):**
72
+ ```json
73
+ {
74
+ "type": "tts_streaming_text",
75
+ "text_chunks": ["Hello", "how are you", "today?"],
76
+ "voice_preset": "v2/en_speaker_6",
77
+ "is_final": true
78
+ }
79
+ ```
80
+
81
+ **Synthesis Result:**
82
+ ```json
83
+ {
84
+ "type": "tts_synthesis_complete",
85
+ "client_id": "uuid",
86
+ "audio_data": "base64_encoded_wav_audio",
87
+ "audio_format": "wav",
88
+ "text": "Hello, how are you today?",
89
+ "voice_preset": "v2/en_speaker_6",
90
+ "audio_size": 12345,
91
+ "timing": {
92
+ "processing_time": 2.34,
93
+ "device": "cuda"
94
+ },
95
+ "status": "success"
96
+ }
97
+ ```
98
+
99
+ ### HTTP: `/health`
100
+
101
+ ```json
102
+ {
103
+ "service": "TTS WebSocket Service",
104
+ "version": "1.0.0",
105
+ "status": "healthy",
106
+ "model_loaded": true,
107
+ "active_connections": 1,
108
+ "available_voices": 10,
109
+ "device": "cuda"
110
+ }
111
+ ```
112
+
113
+ ## Port Configuration
114
+
115
+ - **Default Port**: `7860` (HuggingFace Spaces standard port)
116
+ - **WebSocket Endpoint**: `ws://localhost:7860/ws/tts`
117
+ - **Health Check**: `http://localhost:7860/health`
118
+ - **Note**: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860
119
+
120
+ ## Voice Presets
121
+
122
+ Available voice presets:
123
+ - `v2/en_speaker_0` - Voice 0
124
+ - `v2/en_speaker_1` - Voice 1
125
+ - `v2/en_speaker_2` - Voice 2
126
+ - `v2/en_speaker_3` - Voice 3
127
+ - `v2/en_speaker_4` - Voice 4
128
+ - `v2/en_speaker_5` - Voice 5
129
+ - `v2/en_speaker_6` - Voice 6 (default)
130
+ - `v2/en_speaker_7` - Voice 7
131
+ - `v2/en_speaker_8` - Voice 8
132
+ - `v2/en_speaker_9` - Voice 9
133
+
134
+ ## Architecture
135
+
136
+ This service eliminates all unnecessary dependencies:
137
+ - **Removed**: Gradio web interface
138
+ - **Removed**: MCP protocol support
139
+ - **Removed**: Complex routing
140
+ - **Added**: Direct FastAPI WebSocket endpoints
141
+ - **Added**: Streaming TTS support
142
+ - **Added**: ZeroGPU optimized synthesis
143
+
144
+ ## Integration
145
+
146
+ Connect from VoiceCal WebRTC interface:
147
+
148
+ ```javascript
149
+ const ws = new WebSocket('ws://localhost:7860/ws/tts');
150
+
151
+ // Send text for synthesis
152
+ ws.send(JSON.stringify({
153
+ type: "tts_synthesize",
154
+ text: "Hello world",
155
+ voice_preset: "v2/en_speaker_6"
156
+ }));
157
+
158
+ // Streaming synthesis (unmute.sh pattern)
159
+ ws.send(JSON.stringify({
160
+ type: "tts_streaming_text",
161
+ text_chunks: ["Hello", "world"],
162
+ voice_preset: "v2/en_speaker_6",
163
+ is_final: true
164
+ }));
165
+ ```
requirements-websocket.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Minimal requirements for WebSocket-only TTS service
2
+ torch>=2.1.0
3
+ torchaudio>=2.1.0
4
+ transformers>=4.35.0
5
+ accelerate>=0.24.0
6
+ spaces>=0.19.0
7
+ numpy>=1.21.0
8
+ soundfile>=0.12.0
9
+ fastapi>=0.104.0
10
+ uvicorn>=0.24.0
11
+ python-multipart>=0.0.6
version.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Version information for TTS WebSocket Service
4
+ Major version 1.0.0 - Standalone WebSocket-only service
5
+ """
6
+
7
+ __version__ = "1.0.0"
8
+ __build_date__ = "2025-08-25T04:30:00"
9
+ __service__ = "TTS WebSocket Service"
10
+ __description__ = "Standalone WebSocket-only Text-to-Speech service without Gradio or MCP dependencies"
11
+
12
+ def get_version_info():
13
+ """Get complete version information"""
14
+ return {
15
+ "version": __version__,
16
+ "service": __service__,
17
+ "description": __description__,
18
+ "build_date": __build_date__,
19
+ "major_features": [
20
+ "WebSocket-only TTS interface",
21
+ "ZeroGPU Bark TTS integration",
22
+ "FastAPI-based architecture",
23
+ "Multiple voice presets",
24
+ "Streaming TTS support",
25
+ "No Gradio dependencies",
26
+ "No MCP dependencies",
27
+ "Standalone deployment ready"
28
+ ]
29
+ }
30
+
31
+ if __name__ == "__main__":
32
+ import json
33
+ print(json.dumps(get_version_info(), indent=2))
websocket_tts_server.py ADDED
@@ -0,0 +1,425 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Standalone WebSocket-only TTS Service
4
+ Simplified service without Gradio, MCP, or web interfaces
5
+ Following unmute.sh WebRTC pattern for HuggingFace Spaces
6
+ """
7
+
8
+ import asyncio
9
+ import json
10
+ import uuid
11
+ import base64
12
+ import tempfile
13
+ import os
14
+ import logging
15
+ import time
16
+ from datetime import datetime
17
+ from typing import Optional, Dict, Any
18
+ import torch
19
+ from transformers import AutoProcessor, BarkModel
20
+ import soundfile as sf
21
+ import numpy as np
22
+ from fastapi import FastAPI, WebSocket, WebSocketDisconnect
23
+ from fastapi.middleware.cors import CORSMiddleware
24
+ import spaces
25
+ import uvicorn
26
+
27
+ # Configure logging
28
+ logging.basicConfig(level=logging.INFO)
29
+ logger = logging.getLogger(__name__)
30
+
31
+ # Version info
32
+ __version__ = "1.0.0"
33
+ __service__ = "TTS WebSocket Service"
34
+
35
+ class TTSWebSocketService:
36
+ """Standalone TTS service with WebSocket-only interface"""
37
+
38
+ def __init__(self):
39
+ self.model = None
40
+ self.processor = None
41
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
42
+ self.active_connections: Dict[str, WebSocket] = {}
43
+ self.available_voices = [
44
+ "v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2", "v2/en_speaker_3",
45
+ "v2/en_speaker_4", "v2/en_speaker_5", "v2/en_speaker_6", "v2/en_speaker_7",
46
+ "v2/en_speaker_8", "v2/en_speaker_9"
47
+ ]
48
+
49
+ logger.info(f"πŸ”Š {__service__} v{__version__} initializing...")
50
+ logger.info(f"Device: {self.device}")
51
+ logger.info(f"Available voices: {len(self.available_voices)}")
52
+
53
+ async def load_model(self):
54
+ """Load Bark TTS model with ZeroGPU compatibility"""
55
+ if self.model is None:
56
+ logger.info("Loading Bark TTS model...")
57
+
58
+ self.processor = AutoProcessor.from_pretrained("suno/bark")
59
+ self.model = BarkModel.from_pretrained("suno/bark")
60
+
61
+ if self.device == "cuda":
62
+ self.model = self.model.to(self.device)
63
+
64
+ logger.info(f"βœ… Bark model loaded on {self.device}")
65
+
66
+ @spaces.GPU(duration=30)
67
+ async def synthesize_speech(
68
+ self,
69
+ text: str,
70
+ voice_preset: str = "v2/en_speaker_6",
71
+ sample_rate: int = 24000
72
+ ) -> tuple[Optional[str], str, Dict[str, Any]]:
73
+ """Synthesize speech from text using Bark with ZeroGPU"""
74
+
75
+ try:
76
+ if not text.strip():
77
+ return None, "error", {"error": "Empty text provided"}
78
+
79
+ start_time = time.time()
80
+
81
+ # Ensure model is loaded
82
+ if self.model is None:
83
+ await self.load_model()
84
+
85
+ logger.info(f"Synthesizing: '{text[:50]}...' with {voice_preset}")
86
+
87
+ # Process text with voice preset
88
+ inputs = self.processor(
89
+ text,
90
+ voice_preset=voice_preset,
91
+ return_tensors="pt"
92
+ )
93
+
94
+ if self.device == "cuda":
95
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
96
+
97
+ # Generate audio
98
+ with torch.no_grad():
99
+ audio_array = self.model.generate(**inputs)
100
+ audio_array = audio_array.cpu().numpy().squeeze()
101
+
102
+ # Save to temporary WAV file
103
+ with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
104
+ sf.write(tmp_file.name, audio_array, sample_rate)
105
+ temp_path = tmp_file.name
106
+
107
+ # Calculate timing
108
+ processing_time = time.time() - start_time
109
+
110
+ timing_info = {
111
+ "processing_time": processing_time,
112
+ "start_time": datetime.fromtimestamp(start_time).isoformat(),
113
+ "end_time": datetime.now().isoformat(),
114
+ "voice_preset": voice_preset,
115
+ "device": self.device,
116
+ "text_length": len(text),
117
+ "sample_rate": sample_rate
118
+ }
119
+
120
+ logger.info(f"Speech synthesis completed in {processing_time:.2f}s")
121
+
122
+ return temp_path, "success", timing_info
123
+
124
+ except Exception as e:
125
+ logger.error(f"TTS synthesis error: {str(e)}")
126
+ return None, "error", {"error": str(e)}
127
+
128
+ async def connect_websocket(self, websocket: WebSocket) -> str:
129
+ """Accept WebSocket connection and return client ID"""
130
+ client_id = str(uuid.uuid4())
131
+ await websocket.accept()
132
+ self.active_connections[client_id] = websocket
133
+
134
+ # Send connection confirmation
135
+ await websocket.send_text(json.dumps({
136
+ "type": "tts_connection_confirmed",
137
+ "client_id": client_id,
138
+ "service": __service__,
139
+ "version": __version__,
140
+ "available_voices": self.available_voices,
141
+ "device": self.device,
142
+ "message": "TTS WebSocket connected and ready"
143
+ }))
144
+
145
+ logger.info(f"Client {client_id} connected")
146
+ return client_id
147
+
148
+ async def disconnect_websocket(self, client_id: str):
149
+ """Clean up WebSocket connection"""
150
+ if client_id in self.active_connections:
151
+ del self.active_connections[client_id]
152
+ logger.info(f"Client {client_id} disconnected")
153
+
154
+ async def process_tts_message(self, client_id: str, message: Dict[str, Any]):
155
+ """Process incoming TTS request from WebSocket"""
156
+ try:
157
+ websocket = self.active_connections[client_id]
158
+
159
+ # Extract text and voice preset
160
+ text = message.get("text", "").strip()
161
+ voice_preset = message.get("voice_preset", "v2/en_speaker_6")
162
+
163
+ if not text:
164
+ await websocket.send_text(json.dumps({
165
+ "type": "tts_synthesis_error",
166
+ "client_id": client_id,
167
+ "error": "No text provided for synthesis"
168
+ }))
169
+ return
170
+
171
+ # Validate voice preset
172
+ if voice_preset not in self.available_voices:
173
+ voice_preset = "v2/en_speaker_6" # Default fallback
174
+
175
+ # Synthesize speech
176
+ audio_path, status, timing = await self.synthesize_speech(
177
+ text,
178
+ voice_preset
179
+ )
180
+
181
+ if status == "success" and audio_path:
182
+ try:
183
+ # Read generated audio file
184
+ with open(audio_path, 'rb') as audio_file:
185
+ audio_data = audio_file.read()
186
+
187
+ # Encode as base64 for WebSocket transmission
188
+ audio_b64 = base64.b64encode(audio_data).decode('utf-8')
189
+
190
+ # Send result back
191
+ await websocket.send_text(json.dumps({
192
+ "type": "tts_synthesis_complete",
193
+ "client_id": client_id,
194
+ "audio_data": audio_b64,
195
+ "audio_format": "wav",
196
+ "text": text,
197
+ "voice_preset": voice_preset,
198
+ "audio_size": len(audio_data),
199
+ "timing": timing,
200
+ "status": "success"
201
+ }))
202
+
203
+ logger.info(f"TTS synthesis sent to {client_id} ({len(audio_data)} bytes)")
204
+
205
+ finally:
206
+ # Clean up temp file
207
+ if os.path.exists(audio_path):
208
+ os.unlink(audio_path)
209
+ else:
210
+ await websocket.send_text(json.dumps({
211
+ "type": "tts_synthesis_error",
212
+ "client_id": client_id,
213
+ "error": "Speech synthesis failed",
214
+ "timing": timing
215
+ }))
216
+
217
+ except Exception as e:
218
+ logger.error(f"Error processing TTS for {client_id}: {str(e)}")
219
+ if client_id in self.active_connections:
220
+ websocket = self.active_connections[client_id]
221
+ await websocket.send_text(json.dumps({
222
+ "type": "tts_synthesis_error",
223
+ "client_id": client_id,
224
+ "error": f"Processing error: {str(e)}"
225
+ }))
226
+
227
+ async def process_streaming_tts_message(self, client_id: str, message: Dict[str, Any]):
228
+ """Process streaming TTS request (unmute.sh methodology)"""
229
+ try:
230
+ websocket = self.active_connections[client_id]
231
+
232
+ # Extract streaming data
233
+ text_chunks = message.get("text_chunks", [])
234
+ voice_preset = message.get("voice_preset", "v2/en_speaker_6")
235
+ is_final = message.get("is_final", True)
236
+
237
+ if is_final and text_chunks:
238
+ # UNMUTE.SH FLUSH TRICK: Process all accumulated text at once
239
+ complete_text = " ".join(text_chunks).strip()
240
+ logger.info(f"πŸ”Š TTS STREAMING: Final synthesis for {client_id}: '{complete_text[:50]}...'")
241
+
242
+ # Synthesize complete text
243
+ audio_path, status, timing = await self.synthesize_speech(
244
+ complete_text,
245
+ voice_preset
246
+ )
247
+
248
+ if status == "success" and audio_path:
249
+ try:
250
+ # Read generated audio
251
+ with open(audio_path, 'rb') as audio_file:
252
+ audio_data = audio_file.read()
253
+
254
+ # Encode as base64
255
+ audio_b64 = base64.b64encode(audio_data).decode('utf-8')
256
+
257
+ # Send streaming response
258
+ await websocket.send_text(json.dumps({
259
+ "type": "tts_streaming_response",
260
+ "client_id": client_id,
261
+ "audio_data": audio_b64,
262
+ "audio_format": "wav",
263
+ "text": complete_text,
264
+ "text_chunks": text_chunks,
265
+ "voice_preset": voice_preset,
266
+ "audio_size": len(audio_data),
267
+ "timing": timing,
268
+ "is_final": is_final,
269
+ "streaming_method": "unmute.sh_flush_trick",
270
+ "status": "success"
271
+ }))
272
+
273
+ logger.info(f"πŸ”Š TTS STREAMING: Final audio sent to {client_id} ({len(audio_data)} bytes)")
274
+
275
+ finally:
276
+ # Clean up
277
+ if os.path.exists(audio_path):
278
+ os.unlink(audio_path)
279
+ else:
280
+ await websocket.send_text(json.dumps({
281
+ "type": "tts_streaming_error",
282
+ "client_id": client_id,
283
+ "message": f"TTS streaming synthesis failed: {status}",
284
+ "text": complete_text,
285
+ "is_final": is_final
286
+ }))
287
+ else:
288
+ # Send partial progress update (no audio yet)
289
+ await websocket.send_text(json.dumps({
290
+ "type": "tts_streaming_progress",
291
+ "client_id": client_id,
292
+ "text_chunks": text_chunks,
293
+ "is_final": is_final,
294
+ "message": f"Accumulating text chunks: {len(text_chunks)}"
295
+ }))
296
+
297
+ except Exception as e:
298
+ logger.error(f"Error processing streaming TTS for {client_id}: {str(e)}")
299
+ if client_id in self.active_connections:
300
+ websocket = self.active_connections[client_id]
301
+ await websocket.send_text(json.dumps({
302
+ "type": "tts_streaming_error",
303
+ "client_id": client_id,
304
+ "error": f"Streaming processing error: {str(e)}"
305
+ }))
306
+
307
+ # Initialize service
308
+ tts_service = TTSWebSocketService()
309
+
310
+ # Create FastAPI app
311
+ app = FastAPI(
312
+ title="TTS WebSocket Service",
313
+ description="Standalone WebSocket-only Text-to-Speech service",
314
+ version=__version__
315
+ )
316
+
317
+ # Add CORS middleware
318
+ app.add_middleware(
319
+ CORSMiddleware,
320
+ allow_origins=["*"],
321
+ allow_credentials=True,
322
+ allow_methods=["*"],
323
+ allow_headers=["*"],
324
+ )
325
+
326
+ @app.on_event("startup")
327
+ async def startup_event():
328
+ """Initialize service on startup"""
329
+ logger.info(f"πŸš€ {__service__} v{__version__} starting...")
330
+ logger.info("Pre-loading Bark TTS model for optimal performance...")
331
+ await tts_service.load_model()
332
+ logger.info("βœ… Service ready for WebSocket connections")
333
+
334
+ @app.get("/")
335
+ async def root():
336
+ """Health check endpoint"""
337
+ return {
338
+ "service": __service__,
339
+ "version": __version__,
340
+ "status": "ready",
341
+ "endpoints": {
342
+ "websocket": "/ws/tts",
343
+ "health": "/health"
344
+ },
345
+ "available_voices": tts_service.available_voices,
346
+ "device": tts_service.device
347
+ }
348
+
349
+ @app.get("/health")
350
+ async def health_check():
351
+ """Detailed health check"""
352
+ return {
353
+ "service": __service__,
354
+ "version": __version__,
355
+ "status": "healthy",
356
+ "model_loaded": tts_service.model is not None,
357
+ "active_connections": len(tts_service.active_connections),
358
+ "available_voices": len(tts_service.available_voices),
359
+ "device": tts_service.device,
360
+ "timestamp": datetime.now().isoformat()
361
+ }
362
+
363
+ @app.websocket("/ws/tts")
364
+ async def websocket_tts_endpoint(websocket: WebSocket):
365
+ """Main TTS WebSocket endpoint"""
366
+ client_id = None
367
+
368
+ try:
369
+ # Accept connection
370
+ client_id = await tts_service.connect_websocket(websocket)
371
+
372
+ # Handle messages
373
+ while True:
374
+ try:
375
+ # Receive message
376
+ data = await websocket.receive_text()
377
+ message = json.loads(data)
378
+
379
+ # Process based on message type
380
+ message_type = message.get("type", "unknown")
381
+
382
+ if message_type == "tts_synthesize":
383
+ await tts_service.process_tts_message(client_id, message)
384
+ elif message_type == "tts_streaming_text":
385
+ await tts_service.process_streaming_tts_message(client_id, message)
386
+ elif message_type == "ping":
387
+ # Respond to ping
388
+ await websocket.send_text(json.dumps({
389
+ "type": "pong",
390
+ "client_id": client_id,
391
+ "timestamp": datetime.now().isoformat()
392
+ }))
393
+ else:
394
+ logger.warning(f"Unknown message type from {client_id}: {message_type}")
395
+
396
+ except WebSocketDisconnect:
397
+ break
398
+ except json.JSONDecodeError:
399
+ await websocket.send_text(json.dumps({
400
+ "type": "tts_synthesis_error",
401
+ "client_id": client_id,
402
+ "error": "Invalid JSON message format"
403
+ }))
404
+ except Exception as e:
405
+ logger.error(f"Error handling message from {client_id}: {str(e)}")
406
+ break
407
+
408
+ except WebSocketDisconnect:
409
+ logger.info(f"Client {client_id} disconnected normally")
410
+ except Exception as e:
411
+ logger.error(f"WebSocket error for {client_id}: {str(e)}")
412
+ finally:
413
+ if client_id:
414
+ await tts_service.disconnect_websocket(client_id)
415
+
416
+ if __name__ == "__main__":
417
+ port = int(os.environ.get("PORT", 7860)) # HuggingFace Spaces standard port
418
+ logger.info(f"πŸ”Š Starting {__service__} v{__version__} on port {port}")
419
+
420
+ uvicorn.run(
421
+ app,
422
+ host="0.0.0.0",
423
+ port=port,
424
+ log_level="info"
425
+ )