Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.11.0
STT WebSocket Service v1.0.0
Standalone WebSocket-only Speech-to-Text service for VoiceCal integration.
Features
- β
WebSocket-only STT interface (
/ws/stt) - β ZeroGPU Whisper integration
- β FastAPI-based architecture
- β No Gradio dependencies
- β No MCP dependencies
- β Standalone deployment ready
- β Real-time audio transcription
- β Base64 audio transmission
- β Multiple Whisper model sizes
Quick Start
Using the WebSocket Server
# Install dependencies
pip install -r requirements-websocket.txt
# Run standalone WebSocket server
python3 websocket_stt_server.py
Docker Deployment
# Build WebSocket-only image
docker build -f Dockerfile-websocket -t stt-websocket-service .
# Run container
docker run -p 7860:7860 stt-websocket-service
API Endpoints
WebSocket: /ws/stt
Connection Confirmation:
{
"type": "stt_connection_confirmed",
"client_id": "uuid",
"service": "STT WebSocket Service",
"version": "1.0.0",
"model": "whisper-base",
"device": "cuda",
"message": "STT WebSocket connected and ready"
}
Send Audio for Transcription:
{
"type": "stt_audio_chunk",
"audio_data": "base64_encoded_webm_audio",
"language": "auto",
"model_size": "base"
}
Transcription Result:
{
"type": "stt_transcription_complete",
"client_id": "uuid",
"transcription": "Hello world",
"timing": {
"processing_time": 1.23,
"model_size": "base",
"device": "cuda"
},
"status": "success"
}
HTTP: /health
{
"service": "STT WebSocket Service",
"version": "1.0.0",
"status": "healthy",
"model_loaded": true,
"active_connections": 2,
"device": "cuda"
}
Port Configuration
- Default Port:
7860 - WebSocket Endpoint:
ws://localhost:7860/ws/stt - Health Check:
http://localhost:7860/health
Architecture
This service eliminates all unnecessary dependencies:
- Removed: Gradio web interface
- Removed: MCP protocol support
- Removed: Complex routing
- Added: Direct FastAPI WebSocket endpoints
- Added: Simplified audio processing
- Added: ZeroGPU optimized transcription
Integration
Connect from VoiceCal WebRTC interface:
const ws = new WebSocket('ws://localhost:7860/ws/stt');
// Send audio data
ws.send(JSON.stringify({
type: "stt_audio_chunk",
audio_data: base64AudioData,
language: "auto",
model_size: "base"
}));