Spaces:

pgits
/

stt-gpu-service

Sleeping

App Files Files Community

stt-gpu-service / README-websocket.md

Peter Michael Gits

feat: Add standalone WebSocket-only STT service v1.0.0

69f7704 7 months ago

preview code

raw

history blame contribute delete

2.49 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

STT WebSocket Service v1.0.0

Standalone WebSocket-only Speech-to-Text service for VoiceCal integration.

Features

✅ WebSocket-only STT interface (/ws/stt)
✅ ZeroGPU Whisper integration
✅ FastAPI-based architecture
✅ No Gradio dependencies
✅ No MCP dependencies
✅ Standalone deployment ready
✅ Real-time audio transcription
✅ Base64 audio transmission
✅ Multiple Whisper model sizes

Quick Start

Using the WebSocket Server

# Install dependencies
pip install -r requirements-websocket.txt

# Run standalone WebSocket server
python3 websocket_stt_server.py

Docker Deployment

# Build WebSocket-only image
docker build -f Dockerfile-websocket -t stt-websocket-service .

# Run container
docker run -p 7860:7860 stt-websocket-service

API Endpoints

WebSocket: `/ws/stt`

Connection Confirmation:

{
  "type": "stt_connection_confirmed",
  "client_id": "uuid",
  "service": "STT WebSocket Service",
  "version": "1.0.0",
  "model": "whisper-base",
  "device": "cuda",
  "message": "STT WebSocket connected and ready"
}

Send Audio for Transcription:

{
  "type": "stt_audio_chunk",
  "audio_data": "base64_encoded_webm_audio",
  "language": "auto",
  "model_size": "base"
}

Transcription Result:

{
  "type": "stt_transcription_complete",
  "client_id": "uuid",
  "transcription": "Hello world",
  "timing": {
    "processing_time": 1.23,
    "model_size": "base",
    "device": "cuda"
  },
  "status": "success"
}

HTTP: `/health`

{
  "service": "STT WebSocket Service",
  "version": "1.0.0",
  "status": "healthy",
  "model_loaded": true,
  "active_connections": 2,
  "device": "cuda"
}

Port Configuration

Default Port: 7860
WebSocket Endpoint: ws://localhost:7860/ws/stt
Health Check: http://localhost:7860/health

Architecture

This service eliminates all unnecessary dependencies:

Removed: Gradio web interface
Removed: MCP protocol support
Removed: Complex routing
Added: Direct FastAPI WebSocket endpoints
Added: Simplified audio processing
Added: ZeroGPU optimized transcription

Integration

Connect from VoiceCal WebRTC interface:

const ws = new WebSocket('ws://localhost:7860/ws/stt');

// Send audio data
ws.send(JSON.stringify({
  type: "stt_audio_chunk",
  audio_data: base64AudioData,
  language: "auto",
  model_size: "base"
}));