# STT WebSocket Service v1.0.0 Standalone WebSocket-only Speech-to-Text service for VoiceCal integration. ## Features - ✅ WebSocket-only STT interface (`/ws/stt`) - ✅ ZeroGPU Whisper integration - ✅ FastAPI-based architecture - ✅ No Gradio dependencies - ✅ No MCP dependencies - ✅ Standalone deployment ready - ✅ Real-time audio transcription - ✅ Base64 audio transmission - ✅ Multiple Whisper model sizes ## Quick Start ### Using the WebSocket Server ```bash # Install dependencies pip install -r requirements-websocket.txt # Run standalone WebSocket server python3 websocket_stt_server.py ``` ### Docker Deployment ```bash # Build WebSocket-only image docker build -f Dockerfile-websocket -t stt-websocket-service . # Run container docker run -p 7860:7860 stt-websocket-service ``` ## API Endpoints ### WebSocket: `/ws/stt` **Connection Confirmation:** ```json { "type": "stt_connection_confirmed", "client_id": "uuid", "service": "STT WebSocket Service", "version": "1.0.0", "model": "whisper-base", "device": "cuda", "message": "STT WebSocket connected and ready" } ``` **Send Audio for Transcription:** ```json { "type": "stt_audio_chunk", "audio_data": "base64_encoded_webm_audio", "language": "auto", "model_size": "base" } ``` **Transcription Result:** ```json { "type": "stt_transcription_complete", "client_id": "uuid", "transcription": "Hello world", "timing": { "processing_time": 1.23, "model_size": "base", "device": "cuda" }, "status": "success" } ``` ### HTTP: `/health` ```json { "service": "STT WebSocket Service", "version": "1.0.0", "status": "healthy", "model_loaded": true, "active_connections": 2, "device": "cuda" } ``` ## Port Configuration - **Default Port**: `7860` - **WebSocket Endpoint**: `ws://localhost:7860/ws/stt` - **Health Check**: `http://localhost:7860/health` ## Architecture This service eliminates all unnecessary dependencies: - **Removed**: Gradio web interface - **Removed**: MCP protocol support - **Removed**: Complex routing - **Added**: Direct FastAPI WebSocket endpoints - **Added**: Simplified audio processing - **Added**: ZeroGPU optimized transcription ## Integration Connect from VoiceCal WebRTC interface: ```javascript const ws = new WebSocket('ws://localhost:7860/ws/stt'); // Send audio data ws.send(JSON.stringify({ type: "stt_audio_chunk", audio_data: base64AudioData, language: "auto", model_size: "base" })); ```