# 🚀 WebSocket Streaming for TTSFM Real-time audio streaming for text-to-speech generation using WebSockets. ## Overview The WebSocket streaming feature provides: - **Real-time audio chunk delivery** as they're generated - **Progress tracking** with live updates - **Lower perceived latency** - start receiving audio before complete generation - **Cancellable operations** - stop mid-generation if needed ## Quick Start ### 1. Docker Deployment (Recommended) ```bash # Build with WebSocket support docker build -t ttsfm-websocket . # Run with WebSocket enabled docker run -p 8000:8000 \ -e DEBUG=false \ ttsfm-websocket ``` ### 2. Test WebSocket Connection Visit `http://localhost:8000/websocket-demo` for an interactive demo. ### 3. Client Usage ```javascript // Initialize WebSocket client const client = new WebSocketTTSClient({ socketUrl: 'http://localhost:8000', debug: true }); // Generate speech with streaming const result = await client.generateSpeech('Hello, WebSocket world!', { voice: 'alloy', format: 'mp3', onProgress: (progress) => { console.log(`Progress: ${progress.progress}%`); }, onChunk: (chunk) => { console.log(`Received chunk ${chunk.chunkIndex + 1}`); // Process audio chunk in real-time }, onComplete: (result) => { console.log('Generation complete!'); // Play or download the combined audio } }); ``` ## API Reference ### WebSocket Events #### Client → Server **`generate_stream`** ```javascript { text: string, // Text to convert voice: string, // Voice ID (alloy, echo, etc.) format: string, // Audio format (mp3, wav, opus) chunk_size: number // Optional, default 1024 } ``` **`cancel_stream`** ```javascript { request_id: string // Request ID to cancel } ``` #### Server → Client **`stream_started`** ```javascript { request_id: string, timestamp: number } ``` **`audio_chunk`** ```javascript { request_id: string, chunk_index: number, total_chunks: number, audio_data: string, // Hex-encoded audio data format: string, duration: number, generation_time: number, chunk_text: string // Preview of chunk text } ``` **`stream_progress`** ```javascript { request_id: string, progress: number, // 0-100 total_chunks: number, chunks_completed: number, status: string } ``` **`stream_complete`** ```javascript { request_id: string, total_chunks: number, status: 'completed', timestamp: number } ``` **`stream_error`** ```javascript { request_id: string, error: string, timestamp: number } ``` ## Performance Considerations 1. **Chunk Size**: Smaller chunks (512-1024 chars) provide more frequent updates but increase overhead 2. **Network Latency**: WebSocket reduces latency compared to HTTP polling 3. **Audio Buffering**: Client should buffer chunks for smooth playback 4. **Concurrent Streams**: Server supports multiple concurrent streaming sessions ## Browser Support - Chrome/Edge: Full support - Firefox: Full support - Safari: Full support (iOS 11.3+) - IE11: Not supported (use polling fallback) ## Troubleshooting ### Connection Issues ```javascript // Check WebSocket status fetch('/api/websocket/status') .then(res => res.json()) .then(data => console.log('WebSocket status:', data)); ``` ### Debug Mode ```javascript const client = new WebSocketTTSClient({ debug: true // Enable console logging }); ``` ### Common Issues 1. **"WebSocket connection failed"** - Check if port 8000 is accessible - Ensure eventlet is installed: `pip install eventlet>=0.33.3` - Try polling transport as fallback 2. **"Chunks arriving out of order"** - Client automatically sorts chunks by index - Check network stability 3. **"Audio playback stuttering"** - Increase chunk size for better buffering - Check client-side audio buffer implementation ## Advanced Usage ### Custom Chunk Processing ```javascript client.generateSpeech(text, { onChunk: async (chunk) => { // Custom processing per chunk const processed = await processAudioChunk(chunk.audioData); audioQueue.push(processed); // Start playback after first chunk if (chunk.chunkIndex === 0) { startStreamingPlayback(audioQueue); } } }); ``` ### Progress Visualization ```javascript client.generateSpeech(text, { onProgress: (progress) => { // Update UI progress bar progressBar.style.width = `${progress.progress}%`; statusText.textContent = `Processing chunk ${progress.chunksCompleted}/${progress.totalChunks}`; } }); ``` ## Security - WebSocket connections respect API key authentication if enabled - CORS is configured for cross-origin requests - SSL/TLS recommended for production deployments ## Deployment Notes For production deployment with your existing setup: ```bash # Build new image with WebSocket support docker build -t ttsfm-websocket:latest . # Deploy to your server (192.168.1.150) docker stop ttsfm-container docker rm ttsfm-container docker run -d \ --name ttsfm-container \ -p 8000:8000 \ -e REQUIRE_API_KEY=true \ -e TTSFM_API_KEY=your-secret-key \ -e DEBUG=false \ ttsfm-websocket:latest ``` ## Performance Metrics Based on testing with openai.fm backend: - First chunk delivery: ~0.5-1s - Streaming overhead: ~10-15% vs batch processing - Concurrent connections: 100+ (limited by server resources) - Memory usage: ~50MB per active stream *Built by a grumpy senior engineer who thinks HTTP was good enough*