Spaces:

pgits
/

stt-gpu-service-v3

Sleeping

# Clone your space repository
git clone https://huggingface.co/spaces/{your_username}/stt-gpu-service-v3
cd stt-gpu-service-v3

# Copy all files from this project
cp -r /path/to/kyutai-rustServer/* .

# Add and commit
git add .
git commit -m "Initial deployment of Kyutai STT Server v3

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>"

git push

3. Space Settings

In your HuggingFace Space settings:

Hardware: T4 Small GPU (~$0.50/hour)
Auto-sleep: Enable (sleeps after 30-60min inactivity)
Environment Variables: None required (all configured in code)

4. Monitoring Deployment

Watch the build logs in the HuggingFace Space
Build process includes:
- Rust compilation with CUDA support
- Model downloading and caching
- Server startup and health checks

5. Testing the Deployment

Once deployed, the Space will provide:

Web Interface: Main Space URL (Gradio + FastAPI)
WebSocket Endpoint: wss://your-space-url/ws
Health Check: https://your-space-url/health

Expected Build Time

Initial Build: 10-15 minutes (downloading models)
Subsequent Builds: 5-8 minutes (cached models)
Cold Start: 30-90 seconds (preloaded models)

Cost Optimization

Auto-Sleep Configuration

Space automatically sleeps after 30-60 minutes of inactivity
Wake-up time: 30-90 seconds (vs 10-15 minutes without model preloading)
No GPU charges during sleep

Manual Control

# Pause space (API call)
curl -X POST "https://huggingface.co/api/spaces/{username}/stt-gpu-service-v3/pause" \
  -H "Authorization: Bearer {your-hf-token}"

# Resume space (API call)  
curl -X POST "https://huggingface.co/api/spaces/{username}/stt-gpu-service-v3/restart" \
  -H "Authorization: Bearer {your-hf-token}"

Cost Examples

On-demand (10 hours/week): ~$29/month
Business hours (40 hours/week): ~$89/month
Daily use (4 hours/day): ~$69/month

Troubleshooting

Build Failures

Check Rust compilation errors in build logs
Ensure CUDA dependencies are correctly installed
Verify model paths in config files

Runtime Issues

Check GPU availability: nvidia-smi in Space terminal
Monitor memory usage for OOM errors
Verify WebSocket connections in browser developer tools

Model Loading Issues

Check HuggingFace model access permissions
Verify internet connectivity for model downloads
Monitor disk space for model caching

API Usage Examples

JavaScript WebSocket Client

const ws = new WebSocket('wss://your-space-url/ws');

ws.onopen = () => {
    // Start streaming
    ws.send(JSON.stringify({
        "type": "start",
        "config": {"enable_timestamps": true}
    }));
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'transcription') {
        console.log('Transcription:', data.result.text);
    }
};

// Send audio data (base64 encoded)
ws.send(JSON.stringify({
    "type": "audio", 
    "data": audioBase64,
    "sample_rate": 16000,
    "channels": 1,
    "timestamp": Date.now()
}));

Python Client Example

import asyncio
import json
import websockets

async def test_stt():
    uri = "wss://your-space-url/ws"
    async with websockets.connect(uri) as websocket:
        # Start streaming
        await websocket.send(json.dumps({
            "type": "start",
            "config": {"enable_timestamps": True}
        }))
        
        # Listen for responses
        async for message in websocket:
            data = json.loads(message)
            print(f"Received: {data}")

asyncio.run(test_stt())