Spaces:
Sleeping
Sleeping
HuggingFace Spaces Deployment Guide
Space Configuration
Space Name: stt-gpu-service-v3
Space Type: Docker
Hardware: GPU (T4 Small recommended for cost optimization)
Visibility: Public
Deployment Steps
1. Create HuggingFace Space
- Go to HuggingFace Spaces
- Click "Create new Space"
- Name:
stt-gpu-service-v3 - Select "Docker" as the SDK
- Choose GPU hardware (T4 Small for cost efficiency)
- Set visibility to Public
2. Upload Files
Upload all files from this repository to the HuggingFace Space:
# Clone your space repository
git clone https://huggingface.co/spaces/{your_username}/stt-gpu-service-v3
cd stt-gpu-service-v3
# Copy all files from this project
cp -r /path/to/kyutai-rustServer/* .
# Add and commit
git add .
git commit -m "Initial deployment of Kyutai STT Server v3
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>"
git push
3. Space Settings
In your HuggingFace Space settings:
- Hardware: T4 Small GPU (~$0.50/hour)
- Auto-sleep: Enable (sleeps after 30-60min inactivity)
- Environment Variables: None required (all configured in code)
4. Monitoring Deployment
- Watch the build logs in the HuggingFace Space
- Build process includes:
- Rust compilation with CUDA support
- Model downloading and caching
- Server startup and health checks
5. Testing the Deployment
Once deployed, the Space will provide:
- Web Interface: Main Space URL (Gradio + FastAPI)
- WebSocket Endpoint:
wss://your-space-url/ws - Health Check:
https://your-space-url/health
Expected Build Time
- Initial Build: 10-15 minutes (downloading models)
- Subsequent Builds: 5-8 minutes (cached models)
- Cold Start: 30-90 seconds (preloaded models)
Cost Optimization
Auto-Sleep Configuration
- Space automatically sleeps after 30-60 minutes of inactivity
- Wake-up time: 30-90 seconds (vs 10-15 minutes without model preloading)
- No GPU charges during sleep
Manual Control
# Pause space (API call)
curl -X POST "https://huggingface.co/api/spaces/{username}/stt-gpu-service-v3/pause" \
-H "Authorization: Bearer {your-hf-token}"
# Resume space (API call)
curl -X POST "https://huggingface.co/api/spaces/{username}/stt-gpu-service-v3/restart" \
-H "Authorization: Bearer {your-hf-token}"
Cost Examples
- On-demand (10 hours/week): ~$29/month
- Business hours (40 hours/week): ~$89/month
- Daily use (4 hours/day): ~$69/month
Troubleshooting
Build Failures
- Check Rust compilation errors in build logs
- Ensure CUDA dependencies are correctly installed
- Verify model paths in config files
Runtime Issues
- Check GPU availability:
nvidia-smiin Space terminal - Monitor memory usage for OOM errors
- Verify WebSocket connections in browser developer tools
Model Loading Issues
- Check HuggingFace model access permissions
- Verify internet connectivity for model downloads
- Monitor disk space for model caching
API Usage Examples
JavaScript WebSocket Client
const ws = new WebSocket('wss://your-space-url/ws');
ws.onopen = () => {
// Start streaming
ws.send(JSON.stringify({
"type": "start",
"config": {"enable_timestamps": true}
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'transcription') {
console.log('Transcription:', data.result.text);
}
};
// Send audio data (base64 encoded)
ws.send(JSON.stringify({
"type": "audio",
"data": audioBase64,
"sample_rate": 16000,
"channels": 1,
"timestamp": Date.now()
}));
Python Client Example
import asyncio
import json
import websockets
async def test_stt():
uri = "wss://your-space-url/ws"
async with websockets.connect(uri) as websocket:
# Start streaming
await websocket.send(json.dumps({
"type": "start",
"config": {"enable_timestamps": True}
}))
# Listen for responses
async for message in websocket:
data = json.loads(message)
print(f"Received: {data}")
asyncio.run(test_stt())