stt-gpu-service / README-websocket.md
Peter Michael Gits
feat: Add standalone WebSocket-only STT service v1.0.0
69f7704
# STT WebSocket Service v1.0.0
Standalone WebSocket-only Speech-to-Text service for VoiceCal integration.
## Features
- βœ… WebSocket-only STT interface (`/ws/stt`)
- βœ… ZeroGPU Whisper integration
- βœ… FastAPI-based architecture
- βœ… No Gradio dependencies
- βœ… No MCP dependencies
- βœ… Standalone deployment ready
- βœ… Real-time audio transcription
- βœ… Base64 audio transmission
- βœ… Multiple Whisper model sizes
## Quick Start
### Using the WebSocket Server
```bash
# Install dependencies
pip install -r requirements-websocket.txt
# Run standalone WebSocket server
python3 websocket_stt_server.py
```
### Docker Deployment
```bash
# Build WebSocket-only image
docker build -f Dockerfile-websocket -t stt-websocket-service .
# Run container
docker run -p 7860:7860 stt-websocket-service
```
## API Endpoints
### WebSocket: `/ws/stt`
**Connection Confirmation:**
```json
{
"type": "stt_connection_confirmed",
"client_id": "uuid",
"service": "STT WebSocket Service",
"version": "1.0.0",
"model": "whisper-base",
"device": "cuda",
"message": "STT WebSocket connected and ready"
}
```
**Send Audio for Transcription:**
```json
{
"type": "stt_audio_chunk",
"audio_data": "base64_encoded_webm_audio",
"language": "auto",
"model_size": "base"
}
```
**Transcription Result:**
```json
{
"type": "stt_transcription_complete",
"client_id": "uuid",
"transcription": "Hello world",
"timing": {
"processing_time": 1.23,
"model_size": "base",
"device": "cuda"
},
"status": "success"
}
```
### HTTP: `/health`
```json
{
"service": "STT WebSocket Service",
"version": "1.0.0",
"status": "healthy",
"model_loaded": true,
"active_connections": 2,
"device": "cuda"
}
```
## Port Configuration
- **Default Port**: `7860`
- **WebSocket Endpoint**: `ws://localhost:7860/ws/stt`
- **Health Check**: `http://localhost:7860/health`
## Architecture
This service eliminates all unnecessary dependencies:
- **Removed**: Gradio web interface
- **Removed**: MCP protocol support
- **Removed**: Complex routing
- **Added**: Direct FastAPI WebSocket endpoints
- **Added**: Simplified audio processing
- **Added**: ZeroGPU optimized transcription
## Integration
Connect from VoiceCal WebRTC interface:
```javascript
const ws = new WebSocket('ws://localhost:7860/ws/stt');
// Send audio data
ws.send(JSON.stringify({
type: "stt_audio_chunk",
audio_data: base64AudioData,
language: "auto",
model_size: "base"
}));
```