stt-gpu-service / README-websocket.md
Peter Michael Gits
feat: Add standalone WebSocket-only STT service v1.0.0
69f7704

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

STT WebSocket Service v1.0.0

Standalone WebSocket-only Speech-to-Text service for VoiceCal integration.

Features

  • βœ… WebSocket-only STT interface (/ws/stt)
  • βœ… ZeroGPU Whisper integration
  • βœ… FastAPI-based architecture
  • βœ… No Gradio dependencies
  • βœ… No MCP dependencies
  • βœ… Standalone deployment ready
  • βœ… Real-time audio transcription
  • βœ… Base64 audio transmission
  • βœ… Multiple Whisper model sizes

Quick Start

Using the WebSocket Server

# Install dependencies
pip install -r requirements-websocket.txt

# Run standalone WebSocket server
python3 websocket_stt_server.py

Docker Deployment

# Build WebSocket-only image
docker build -f Dockerfile-websocket -t stt-websocket-service .

# Run container
docker run -p 7860:7860 stt-websocket-service

API Endpoints

WebSocket: /ws/stt

Connection Confirmation:

{
  "type": "stt_connection_confirmed",
  "client_id": "uuid",
  "service": "STT WebSocket Service",
  "version": "1.0.0",
  "model": "whisper-base",
  "device": "cuda",
  "message": "STT WebSocket connected and ready"
}

Send Audio for Transcription:

{
  "type": "stt_audio_chunk",
  "audio_data": "base64_encoded_webm_audio",
  "language": "auto",
  "model_size": "base"
}

Transcription Result:

{
  "type": "stt_transcription_complete",
  "client_id": "uuid",
  "transcription": "Hello world",
  "timing": {
    "processing_time": 1.23,
    "model_size": "base",
    "device": "cuda"
  },
  "status": "success"
}

HTTP: /health

{
  "service": "STT WebSocket Service",
  "version": "1.0.0",
  "status": "healthy",
  "model_loaded": true,
  "active_connections": 2,
  "device": "cuda"
}

Port Configuration

  • Default Port: 7860
  • WebSocket Endpoint: ws://localhost:7860/ws/stt
  • Health Check: http://localhost:7860/health

Architecture

This service eliminates all unnecessary dependencies:

  • Removed: Gradio web interface
  • Removed: MCP protocol support
  • Removed: Complex routing
  • Added: Direct FastAPI WebSocket endpoints
  • Added: Simplified audio processing
  • Added: ZeroGPU optimized transcription

Integration

Connect from VoiceCal WebRTC interface:

const ws = new WebSocket('ws://localhost:7860/ws/stt');

// Send audio data
ws.send(JSON.stringify({
  type: "stt_audio_chunk",
  audio_data: base64AudioData,
  language: "auto",
  model_size: "base"
}));