Spaces:
Sleeping
Sleeping
| # STT WebSocket Service v1.0.0 | |
| Standalone WebSocket-only Speech-to-Text service for VoiceCal integration. | |
| ## Features | |
| - β WebSocket-only STT interface (`/ws/stt`) | |
| - β ZeroGPU Whisper integration | |
| - β FastAPI-based architecture | |
| - β No Gradio dependencies | |
| - β No MCP dependencies | |
| - β Standalone deployment ready | |
| - β Real-time audio transcription | |
| - β Base64 audio transmission | |
| - β Multiple Whisper model sizes | |
| ## Quick Start | |
| ### Using the WebSocket Server | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements-websocket.txt | |
| # Run standalone WebSocket server | |
| python3 websocket_stt_server.py | |
| ``` | |
| ### Docker Deployment | |
| ```bash | |
| # Build WebSocket-only image | |
| docker build -f Dockerfile-websocket -t stt-websocket-service . | |
| # Run container | |
| docker run -p 7860:7860 stt-websocket-service | |
| ``` | |
| ## API Endpoints | |
| ### WebSocket: `/ws/stt` | |
| **Connection Confirmation:** | |
| ```json | |
| { | |
| "type": "stt_connection_confirmed", | |
| "client_id": "uuid", | |
| "service": "STT WebSocket Service", | |
| "version": "1.0.0", | |
| "model": "whisper-base", | |
| "device": "cuda", | |
| "message": "STT WebSocket connected and ready" | |
| } | |
| ``` | |
| **Send Audio for Transcription:** | |
| ```json | |
| { | |
| "type": "stt_audio_chunk", | |
| "audio_data": "base64_encoded_webm_audio", | |
| "language": "auto", | |
| "model_size": "base" | |
| } | |
| ``` | |
| **Transcription Result:** | |
| ```json | |
| { | |
| "type": "stt_transcription_complete", | |
| "client_id": "uuid", | |
| "transcription": "Hello world", | |
| "timing": { | |
| "processing_time": 1.23, | |
| "model_size": "base", | |
| "device": "cuda" | |
| }, | |
| "status": "success" | |
| } | |
| ``` | |
| ### HTTP: `/health` | |
| ```json | |
| { | |
| "service": "STT WebSocket Service", | |
| "version": "1.0.0", | |
| "status": "healthy", | |
| "model_loaded": true, | |
| "active_connections": 2, | |
| "device": "cuda" | |
| } | |
| ``` | |
| ## Port Configuration | |
| - **Default Port**: `7860` | |
| - **WebSocket Endpoint**: `ws://localhost:7860/ws/stt` | |
| - **Health Check**: `http://localhost:7860/health` | |
| ## Architecture | |
| This service eliminates all unnecessary dependencies: | |
| - **Removed**: Gradio web interface | |
| - **Removed**: MCP protocol support | |
| - **Removed**: Complex routing | |
| - **Added**: Direct FastAPI WebSocket endpoints | |
| - **Added**: Simplified audio processing | |
| - **Added**: ZeroGPU optimized transcription | |
| ## Integration | |
| Connect from VoiceCal WebRTC interface: | |
| ```javascript | |
| const ws = new WebSocket('ws://localhost:7860/ws/stt'); | |
| // Send audio data | |
| ws.send(JSON.stringify({ | |
| type: "stt_audio_chunk", | |
| audio_data: base64AudioData, | |
| language: "auto", | |
| model_size: "base" | |
| })); | |
| ``` |