Spaces:

pgits
/

stt-gpu-service

Sleeping

App Files Files Community

stt-gpu-service / README-websocket.md

Peter Michael Gits

feat: Add standalone WebSocket-only STT service v1.0.0

69f7704 7 months ago

preview code

raw

history blame contribute delete

2.49 kB

	# STT WebSocket Service v1.0.0

	Standalone WebSocket-only Speech-to-Text service for VoiceCal integration.

	## Features

	- ✅ WebSocket-only STT interface (`/ws/stt`)
	- ✅ ZeroGPU Whisper integration
	- ✅ FastAPI-based architecture
	- ✅ No Gradio dependencies
	- ✅ No MCP dependencies
	- ✅ Standalone deployment ready
	- ✅ Real-time audio transcription
	- ✅ Base64 audio transmission
	- ✅ Multiple Whisper model sizes

	## Quick Start

	### Using the WebSocket Server

	```bash
	# Install dependencies
	pip install -r requirements-websocket.txt

	# Run standalone WebSocket server
	python3 websocket_stt_server.py
	```

	### Docker Deployment

	```bash
	# Build WebSocket-only image
	docker build -f Dockerfile-websocket -t stt-websocket-service .

	# Run container
	docker run -p 7860:7860 stt-websocket-service
	```

	## API Endpoints

	### WebSocket: `/ws/stt`

	Connection Confirmation:
	```json
	{
	"type": "stt_connection_confirmed",
	"client_id": "uuid",
	"service": "STT WebSocket Service",
	"version": "1.0.0",
	"model": "whisper-base",
	"device": "cuda",
	"message": "STT WebSocket connected and ready"
	}
	```

	Send Audio for Transcription:
	```json
	{
	"type": "stt_audio_chunk",
	"audio_data": "base64_encoded_webm_audio",
	"language": "auto",
	"model_size": "base"
	}
	```

	Transcription Result:
	```json
	{
	"type": "stt_transcription_complete",
	"client_id": "uuid",
	"transcription": "Hello world",
	"timing": {
	"processing_time": 1.23,
	"model_size": "base",
	"device": "cuda"
	},
	"status": "success"
	}
	```

	### HTTP: `/health`

	```json
	{
	"service": "STT WebSocket Service",
	"version": "1.0.0",
	"status": "healthy",
	"model_loaded": true,
	"active_connections": 2,
	"device": "cuda"
	}
	```

	## Port Configuration

	- Default Port: `7860`
	- WebSocket Endpoint: `ws://localhost:7860/ws/stt`
	- Health Check: `http://localhost:7860/health`

	## Architecture

	This service eliminates all unnecessary dependencies:
	- Removed: Gradio web interface
	- Removed: MCP protocol support
	- Removed: Complex routing
	- Added: Direct FastAPI WebSocket endpoints
	- Added: Simplified audio processing
	- Added: ZeroGPU optimized transcription

	## Integration

	Connect from VoiceCal WebRTC interface:

	```javascript
	const ws = new WebSocket('ws://localhost:7860/ws/stt');

	// Send audio data
	ws.send(JSON.stringify({
	type: "stt_audio_chunk",
	audio_data: base64AudioData,
	language: "auto",
	model_size: "base"
	}));
	```