Spaces:
Sleeping
Sleeping
Peter Michael Gits
trigger: Force HuggingFace Spaces rebuild for latest hybrid implementation
219ebbf A newer version of the Gradio SDK is available: 6.11.0
metadata
title: STT WebSocket Service v1.0.0
emoji: π€
colorFrom: red
colorTo: red
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit
suggested_hardware: zero-a10g
π€ ZeroGPU Speech-to-Text Service
High-performance speech recognition powered by Hugging Face ZeroGPU and Nvidia H200 with Whisper models.
π― Features
- π ZeroGPU Acceleration: Dynamic H200 GPU allocation
- π€ Multi-language Support: 100+ languages with auto-detection
- β‘ Real-time Processing: Often faster than audio duration
- π Timestamp Precision: Word-level timing information
- π¦ Batch Processing: Multiple files in parallel
- π΄ Live Transcription: Real-time microphone input
- π WebRTC Ready: Integration with live audio streams
- π§ MCP Protocol: Model Context Protocol for direct integration
- π° Cost Efficient: No idle costs with Pro subscription
ποΈ Architecture
- Backend: Whisper (OpenAI) with PyTorch optimization
- Frontend: Gradio with enhanced multi-tab UI
- GPU: ZeroGPU with H200 dynamic scaling
- Models: Whisper tiny/base/small/medium/large-v2
π Performance
- Real-time Factor: 0.1x - 0.5x (much faster than real-time)
- Languages: 100+ with auto-detection
- Accuracy: State-of-the-art with Whisper models
- Batch processing: Parallel execution on H200
π» API Usage
Python Client (Gradio)
from gradio_client import Client
client = Client("YOUR_USERNAME/stt-gpu-service")
result = client.predict(
"audio.wav", # audio file
"auto", # language
"base", # model size
True, # timestamps
api_name="/predict"
)
status, transcription, timestamps = result
MCP Client (Model Context Protocol)
# Using MCP client for direct integration
import json
from mcp import ClientSession
from mcp.client.stdio import StdioServerParameters
# Connect to MCP server
server_params = StdioServerParameters(
command="python",
args=["app.py", "--mcp-only"]
)
async with ClientSession(server_params) as session:
# Transcribe single audio file
result = await session.call_tool(
"stt_transcribe",
{
"audio_path": "/path/to/audio.wav",
"language": "auto",
"model_size": "base",
"return_timestamps": True
}
)
transcription_data = json.loads(result.content[0].text)
print(f"Transcription: {transcription_data['transcription']}")
Dual Protocol Support
This service now supports both Gradio HTTP API and MCP protocol simultaneously:
- Gradio Interface: Traditional web UI and HTTP API (port 7860)
- MCP Protocol: Direct tool integration via stdio
MCP Tools Available:
stt_transcribe: Transcribe single audio filestt_batch_transcribe: Batch transcribe multiple filesstt_get_info: Get system and service information
Running Modes:
# Dual mode (default) - Both Gradio + MCP
python app.py
# MCP-only mode - Just MCP server
python app.py --mcp-only
π MCP Integration Benefits
Direct Tool Integration
- No HTTP overhead: Direct protocol communication
- Type-safe interactions: Structured tool definitions
- Streaming support: Real-time tool communication
- Auto-discovery: Tools are automatically discoverable
Use Cases
- LLM Agent Integration: Direct STT capability for AI agents
- Workflow Automation: Seamless audio processing in pipelines
- Development Tools: IDE extensions with voice transcription
- Multi-modal Applications: Combine with other MCP services
Integration with ChatCal Voice
# Example: Voice-enabled calendar scheduling
async def voice_calendar_integration():
# STT: Convert voice to text
stt_result = await stt_session.call_tool("stt_transcribe", {
"audio_path": "user_voice_request.wav",
"language": "auto"
})
# Process calendar request with LLM
calendar_action = process_calendar_request(stt_result.transcription)
# TTS: Convert response back to voice (if integrated)
# tts_result = await tts_session.call_tool("tts_generate", {...})