ttsfm / docs /websocket-streaming.md
NitinBot001's picture
Upload 38 files
3ca5f72 verified

πŸš€ WebSocket Streaming for TTSFM

Real-time audio streaming for text-to-speech generation using WebSockets.

Overview

The WebSocket streaming feature provides:

  • Real-time audio chunk delivery as they're generated
  • Progress tracking with live updates
  • Lower perceived latency - start receiving audio before complete generation
  • Cancellable operations - stop mid-generation if needed

Quick Start

1. Docker Deployment (Recommended)

# Build with WebSocket support
docker build -t ttsfm-websocket .

# Run with WebSocket enabled
docker run -p 8000:8000 \
  -e DEBUG=false \
  ttsfm-websocket

2. Test WebSocket Connection

Visit http://localhost:8000/websocket-demo for an interactive demo.

3. Client Usage

// Initialize WebSocket client
const client = new WebSocketTTSClient({
    socketUrl: 'http://localhost:8000',
    debug: true
});

// Generate speech with streaming
const result = await client.generateSpeech('Hello, WebSocket world!', {
    voice: 'alloy',
    format: 'mp3',
    onProgress: (progress) => {
        console.log(`Progress: ${progress.progress}%`);
    },
    onChunk: (chunk) => {
        console.log(`Received chunk ${chunk.chunkIndex + 1}`);
        // Process audio chunk in real-time
    },
    onComplete: (result) => {
        console.log('Generation complete!');
        // Play or download the combined audio
    }
});

API Reference

WebSocket Events

Client β†’ Server

generate_stream

{
    text: string,          // Text to convert
    voice: string,         // Voice ID (alloy, echo, etc.)
    format: string,        // Audio format (mp3, wav, opus)
    chunk_size: number     // Optional, default 1024
}

cancel_stream

{
    request_id: string     // Request ID to cancel
}

Server β†’ Client

stream_started

{
    request_id: string,
    timestamp: number
}

audio_chunk

{
    request_id: string,
    chunk_index: number,
    total_chunks: number,
    audio_data: string,    // Hex-encoded audio data
    format: string,
    duration: number,
    generation_time: number,
    chunk_text: string     // Preview of chunk text
}

stream_progress

{
    request_id: string,
    progress: number,      // 0-100
    total_chunks: number,
    chunks_completed: number,
    status: string
}

stream_complete

{
    request_id: string,
    total_chunks: number,
    status: 'completed',
    timestamp: number
}

stream_error

{
    request_id: string,
    error: string,
    timestamp: number
}

Performance Considerations

  1. Chunk Size: Smaller chunks (512-1024 chars) provide more frequent updates but increase overhead
  2. Network Latency: WebSocket reduces latency compared to HTTP polling
  3. Audio Buffering: Client should buffer chunks for smooth playback
  4. Concurrent Streams: Server supports multiple concurrent streaming sessions

Browser Support

  • Chrome/Edge: Full support
  • Firefox: Full support
  • Safari: Full support (iOS 11.3+)
  • IE11: Not supported (use polling fallback)

Troubleshooting

Connection Issues

// Check WebSocket status
fetch('/api/websocket/status')
    .then(res => res.json())
    .then(data => console.log('WebSocket status:', data));

Debug Mode

const client = new WebSocketTTSClient({
    debug: true  // Enable console logging
});

Common Issues

  1. "WebSocket connection failed"

    • Check if port 8000 is accessible
    • Ensure eventlet is installed: pip install eventlet>=0.33.3
    • Try polling transport as fallback
  2. "Chunks arriving out of order"

    • Client automatically sorts chunks by index
    • Check network stability
  3. "Audio playback stuttering"

    • Increase chunk size for better buffering
    • Check client-side audio buffer implementation

Advanced Usage

Custom Chunk Processing

client.generateSpeech(text, {
    onChunk: async (chunk) => {
        // Custom processing per chunk
        const processed = await processAudioChunk(chunk.audioData);
        audioQueue.push(processed);
        
        // Start playback after first chunk
        if (chunk.chunkIndex === 0) {
            startStreamingPlayback(audioQueue);
        }
    }
});

Progress Visualization

client.generateSpeech(text, {
    onProgress: (progress) => {
        // Update UI progress bar
        progressBar.style.width = `${progress.progress}%`;
        statusText.textContent = `Processing chunk ${progress.chunksCompleted}/${progress.totalChunks}`;
    }
});

Security

  • WebSocket connections respect API key authentication if enabled
  • CORS is configured for cross-origin requests
  • SSL/TLS recommended for production deployments

Deployment Notes

For production deployment with your existing setup:

# Build new image with WebSocket support
docker build -t ttsfm-websocket:latest .

# Deploy to your server (192.168.1.150)
docker stop ttsfm-container
docker rm ttsfm-container
docker run -d \
  --name ttsfm-container \
  -p 8000:8000 \
  -e REQUIRE_API_KEY=true \
  -e TTSFM_API_KEY=your-secret-key \
  -e DEBUG=false \
  ttsfm-websocket:latest

Performance Metrics

Based on testing with openai.fm backend:

  • First chunk delivery: ~0.5-1s
  • Streaming overhead: ~10-15% vs batch processing
  • Concurrent connections: 100+ (limited by server resources)
  • Memory usage: ~50MB per active stream

Built by a grumpy senior engineer who thinks HTTP was good enough