Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.11.0
TTS WebSocket Service v1.0.0
Standalone WebSocket-only Text-to-Speech service for VoiceCal integration.
Features
- β
WebSocket-only TTS interface (
/ws/tts) - β ZeroGPU Bark TTS integration
- β FastAPI-based architecture
- β Multiple voice presets (10 speakers)
- β Streaming TTS support (unmute.sh methodology)
- β No Gradio dependencies
- β No MCP dependencies
- β Standalone deployment ready
- β Base64 audio transmission
- β WAV audio format output
Quick Start
Using the WebSocket Server
# Install dependencies
pip install -r requirements-websocket.txt
# Run standalone WebSocket server
python3 websocket_tts_server.py
Docker Deployment
# Build WebSocket-only image
docker build -f Dockerfile-websocket -t tts-websocket-service .
# Run container
docker run -p 7860:7860 tts-websocket-service
API Endpoints
WebSocket: /ws/tts
Connection Confirmation:
{
"type": "tts_connection_confirmed",
"client_id": "uuid",
"service": "TTS WebSocket Service",
"version": "1.0.0",
"available_voices": [
"v2/en_speaker_0", "v2/en_speaker_1", "v2/en_speaker_2",
"v2/en_speaker_3", "v2/en_speaker_4", "v2/en_speaker_5",
"v2/en_speaker_6", "v2/en_speaker_7", "v2/en_speaker_8",
"v2/en_speaker_9"
],
"device": "cuda",
"message": "TTS WebSocket connected and ready"
}
Single Synthesis Request:
{
"type": "tts_synthesize",
"text": "Hello, how are you today?",
"voice_preset": "v2/en_speaker_6"
}
Streaming Synthesis (unmute.sh methodology):
{
"type": "tts_streaming_text",
"text_chunks": ["Hello", "how are you", "today?"],
"voice_preset": "v2/en_speaker_6",
"is_final": true
}
Synthesis Result:
{
"type": "tts_synthesis_complete",
"client_id": "uuid",
"audio_data": "base64_encoded_wav_audio",
"audio_format": "wav",
"text": "Hello, how are you today?",
"voice_preset": "v2/en_speaker_6",
"audio_size": 12345,
"timing": {
"processing_time": 2.34,
"device": "cuda"
},
"status": "success"
}
HTTP: /health
{
"service": "TTS WebSocket Service",
"version": "1.0.0",
"status": "healthy",
"model_loaded": true,
"active_connections": 1,
"available_voices": 10,
"device": "cuda"
}
Port Configuration
- Default Port:
7860(HuggingFace Spaces standard port) - WebSocket Endpoint:
ws://localhost:7860/ws/tts - Health Check:
http://localhost:7860/health - Note: Each HuggingFace Space gets its own IP address, so both STT and TTS can use port 7860
Voice Presets
Available voice presets:
v2/en_speaker_0- Voice 0v2/en_speaker_1- Voice 1v2/en_speaker_2- Voice 2v2/en_speaker_3- Voice 3v2/en_speaker_4- Voice 4v2/en_speaker_5- Voice 5v2/en_speaker_6- Voice 6 (default)v2/en_speaker_7- Voice 7v2/en_speaker_8- Voice 8v2/en_speaker_9- Voice 9
Architecture
This service eliminates all unnecessary dependencies:
- Removed: Gradio web interface
- Removed: MCP protocol support
- Removed: Complex routing
- Added: Direct FastAPI WebSocket endpoints
- Added: Streaming TTS support
- Added: ZeroGPU optimized synthesis
Integration
Connect from VoiceCal WebRTC interface:
const ws = new WebSocket('ws://localhost:7860/ws/tts');
// Send text for synthesis
ws.send(JSON.stringify({
type: "tts_synthesize",
text: "Hello world",
voice_preset: "v2/en_speaker_6"
}));
// Streaming synthesis (unmute.sh pattern)
ws.send(JSON.stringify({
type: "tts_streaming_text",
text_chunks: ["Hello", "world"],
voice_preset: "v2/en_speaker_6",
is_final: true
}));