Spaces:
Runtime error
Runtime error
| # π WebSocket Streaming for TTSFM | |
| Real-time audio streaming for text-to-speech generation using WebSockets. | |
| ## Overview | |
| The WebSocket streaming feature provides: | |
| - **Real-time audio chunk delivery** as they're generated | |
| - **Progress tracking** with live updates | |
| - **Lower perceived latency** - start receiving audio before complete generation | |
| - **Cancellable operations** - stop mid-generation if needed | |
| ## Quick Start | |
| ### 1. Docker Deployment (Recommended) | |
| ```bash | |
| # Build with WebSocket support | |
| docker build -t ttsfm-websocket . | |
| # Run with WebSocket enabled | |
| docker run -p 8000:8000 \ | |
| -e DEBUG=false \ | |
| ttsfm-websocket | |
| ``` | |
| ### 2. Test WebSocket Connection | |
| Visit `http://localhost:8000/websocket-demo` for an interactive demo. | |
| ### 3. Client Usage | |
| ```javascript | |
| // Initialize WebSocket client | |
| const client = new WebSocketTTSClient({ | |
| socketUrl: 'http://localhost:8000', | |
| debug: true | |
| }); | |
| // Generate speech with streaming | |
| const result = await client.generateSpeech('Hello, WebSocket world!', { | |
| voice: 'alloy', | |
| format: 'mp3', | |
| onProgress: (progress) => { | |
| console.log(`Progress: ${progress.progress}%`); | |
| }, | |
| onChunk: (chunk) => { | |
| console.log(`Received chunk ${chunk.chunkIndex + 1}`); | |
| // Process audio chunk in real-time | |
| }, | |
| onComplete: (result) => { | |
| console.log('Generation complete!'); | |
| // Play or download the combined audio | |
| } | |
| }); | |
| ``` | |
| ## API Reference | |
| ### WebSocket Events | |
| #### Client β Server | |
| **`generate_stream`** | |
| ```javascript | |
| { | |
| text: string, // Text to convert | |
| voice: string, // Voice ID (alloy, echo, etc.) | |
| format: string, // Audio format (mp3, wav, opus) | |
| chunk_size: number // Optional, default 1024 | |
| } | |
| ``` | |
| **`cancel_stream`** | |
| ```javascript | |
| { | |
| request_id: string // Request ID to cancel | |
| } | |
| ``` | |
| #### Server β Client | |
| **`stream_started`** | |
| ```javascript | |
| { | |
| request_id: string, | |
| timestamp: number | |
| } | |
| ``` | |
| **`audio_chunk`** | |
| ```javascript | |
| { | |
| request_id: string, | |
| chunk_index: number, | |
| total_chunks: number, | |
| audio_data: string, // Hex-encoded audio data | |
| format: string, | |
| duration: number, | |
| generation_time: number, | |
| chunk_text: string // Preview of chunk text | |
| } | |
| ``` | |
| **`stream_progress`** | |
| ```javascript | |
| { | |
| request_id: string, | |
| progress: number, // 0-100 | |
| total_chunks: number, | |
| chunks_completed: number, | |
| status: string | |
| } | |
| ``` | |
| **`stream_complete`** | |
| ```javascript | |
| { | |
| request_id: string, | |
| total_chunks: number, | |
| status: 'completed', | |
| timestamp: number | |
| } | |
| ``` | |
| **`stream_error`** | |
| ```javascript | |
| { | |
| request_id: string, | |
| error: string, | |
| timestamp: number | |
| } | |
| ``` | |
| ## Performance Considerations | |
| 1. **Chunk Size**: Smaller chunks (512-1024 chars) provide more frequent updates but increase overhead | |
| 2. **Network Latency**: WebSocket reduces latency compared to HTTP polling | |
| 3. **Audio Buffering**: Client should buffer chunks for smooth playback | |
| 4. **Concurrent Streams**: Server supports multiple concurrent streaming sessions | |
| ## Browser Support | |
| - Chrome/Edge: Full support | |
| - Firefox: Full support | |
| - Safari: Full support (iOS 11.3+) | |
| - IE11: Not supported (use polling fallback) | |
| ## Troubleshooting | |
| ### Connection Issues | |
| ```javascript | |
| // Check WebSocket status | |
| fetch('/api/websocket/status') | |
| .then(res => res.json()) | |
| .then(data => console.log('WebSocket status:', data)); | |
| ``` | |
| ### Debug Mode | |
| ```javascript | |
| const client = new WebSocketTTSClient({ | |
| debug: true // Enable console logging | |
| }); | |
| ``` | |
| ### Common Issues | |
| 1. **"WebSocket connection failed"** | |
| - Check if port 8000 is accessible | |
| - Ensure eventlet is installed: `pip install eventlet>=0.33.3` | |
| - Try polling transport as fallback | |
| 2. **"Chunks arriving out of order"** | |
| - Client automatically sorts chunks by index | |
| - Check network stability | |
| 3. **"Audio playback stuttering"** | |
| - Increase chunk size for better buffering | |
| - Check client-side audio buffer implementation | |
| ## Advanced Usage | |
| ### Custom Chunk Processing | |
| ```javascript | |
| client.generateSpeech(text, { | |
| onChunk: async (chunk) => { | |
| // Custom processing per chunk | |
| const processed = await processAudioChunk(chunk.audioData); | |
| audioQueue.push(processed); | |
| // Start playback after first chunk | |
| if (chunk.chunkIndex === 0) { | |
| startStreamingPlayback(audioQueue); | |
| } | |
| } | |
| }); | |
| ``` | |
| ### Progress Visualization | |
| ```javascript | |
| client.generateSpeech(text, { | |
| onProgress: (progress) => { | |
| // Update UI progress bar | |
| progressBar.style.width = `${progress.progress}%`; | |
| statusText.textContent = `Processing chunk ${progress.chunksCompleted}/${progress.totalChunks}`; | |
| } | |
| }); | |
| ``` | |
| ## Security | |
| - WebSocket connections respect API key authentication if enabled | |
| - CORS is configured for cross-origin requests | |
| - SSL/TLS recommended for production deployments | |
| ## Deployment Notes | |
| For production deployment with your existing setup: | |
| ```bash | |
| # Build new image with WebSocket support | |
| docker build -t ttsfm-websocket:latest . | |
| # Deploy to your server (192.168.1.150) | |
| docker stop ttsfm-container | |
| docker rm ttsfm-container | |
| docker run -d \ | |
| --name ttsfm-container \ | |
| -p 8000:8000 \ | |
| -e REQUIRE_API_KEY=true \ | |
| -e TTSFM_API_KEY=your-secret-key \ | |
| -e DEBUG=false \ | |
| ttsfm-websocket:latest | |
| ``` | |
| ## Performance Metrics | |
| Based on testing with openai.fm backend: | |
| - First chunk delivery: ~0.5-1s | |
| - Streaming overhead: ~10-15% vs batch processing | |
| - Concurrent connections: 100+ (limited by server resources) | |
| - Memory usage: ~50MB per active stream | |
| *Built by a grumpy senior engineer who thinks HTTP was good enough* |