Spaces:
Sleeping
Sleeping
File size: 5,847 Bytes
57fa449 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
# HuggingFace Spaces Deployment Guide
## Overview
This application is configured to run on **HuggingFace Spaces** using local model inference (no external API calls required).
---
## Quick Setup
### 1. Create a New Space
1. Go to https://huggingface.co/new-space
2. Choose **Gradio** as the SDK
3. Select **GPU** hardware (T4 or better recommended)
4. Name your Space (e.g., `transcriptor-ai`)
### 2. Upload Your Code
Upload all files from this directory to your Space, or connect a Git repository.
### 3. Configure Space Settings (Optional)
Go to **Settings β Variables** in your Space and add:
| Variable | Value | Description |
|----------|-------|-------------|
| `DEBUG_MODE` | `True` or `False` | Enable detailed logging |
| `LLM_TEMPERATURE` | `0.7` | Model creativity (0.0-1.0) |
| `LLM_TIMEOUT` | `120` | Timeout in seconds |
| `LOCAL_MODEL` | `microsoft/Phi-3-mini-4k-instruct` | Model to use |
**Note:** All settings have sensible defaults - you don't need to set these unless you want to customize.
---
## Hardware Requirements
### Recommended: GPU (T4 or better)
- **Phi-3-mini-4k-instruct**: 3.8B params, ~8GB GPU RAM
- Processing speed: ~30-60 seconds per transcript chunk
- **Best for:** Production use with multiple users
### Alternative: CPU (not recommended)
- Will work but be very slow (5-10 minutes per chunk)
- Only suitable for testing
---
## Supported Models
You can change the model by setting the `LOCAL_MODEL` variable:
### Small & Fast (Recommended for Free Tier)
```
LOCAL_MODEL=microsoft/Phi-3-mini-4k-instruct (Default - 3.8B params)
```
### Medium (Better quality, needs more GPU)
```
LOCAL_MODEL=mistralai/Mistral-7B-Instruct-v0.3 (7B params)
```
### Alternatives
```
LOCAL_MODEL=HuggingFaceH4/zephyr-7b-beta (7B params, good instruction following)
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B params, very fast but lower quality)
```
---
## Configuration Files
### β
Required Files
- `app.py` - Main application
- `requirements.txt` - Python dependencies
- `llm.py`, `extractors.py`, etc. - Core modules
### β οΈ NOT Needed for Spaces
- `.env` file - Use Spaces Variables instead
- Local database files
- API keys (unless using external APIs)
---
## Environment Configuration
The app automatically detects if it's running on HuggingFace Spaces and uses local model inference by default.
**Default Configuration (no .env needed):**
```python
USE_HF_API = False # Don't use HF Inference API
USE_LMSTUDIO = False # Don't use LM Studio
LLM_BACKEND = local # Use local transformers
DEBUG_MODE = False # Disable debug logs
```
**To override:** Set Spaces Variables (Settings β Variables)
---
## Troubleshooting
### Issue: "Out of Memory" Error
**Solution:** Switch to a smaller model
```
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0
```
### Issue: Very Slow Processing
**Solution:**
1. Make sure you selected **GPU** hardware (not CPU)
2. Check Space logs for "Model loaded on cuda" confirmation
3. If on CPU, upgrade to GPU tier
### Issue: Quality Score 0.00
**Causes:**
1. Model not loaded properly (check logs for "[Local Model] Loading...")
2. GPU out of memory (model falls back to CPU)
3. Timeout too short (increase `LLM_TIMEOUT`)
**Debug Steps:**
1. Set `DEBUG_MODE=True` in Spaces Variables
2. Check logs for detailed error messages
3. Look for "[Local Model] β
Generated X characters"
### Issue: Model Downloads Every Time
**Solution:** HuggingFace Spaces caches models automatically, but first load takes 2-5 minutes.
- Subsequent starts are faster (~30 seconds)
- Don't restart Space unnecessarily
---
## Performance Optimization
### 1. Reduce Context Window
Edit `llm.py` line 399:
```python
max_length=2000 # Reduce from 3500 for faster processing
```
### 2. Lower Token Limit
Set Spaces Variable:
```
MAX_TOKENS_PER_REQUEST=800 # Default is 1500
```
### 3. Use Smaller Model
```
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0
```
### 4. Disable Debug Mode
```
DEBUG_MODE=False
```
---
## Monitoring
### View Logs
1. Go to your Space
2. Click **Logs** tab at the top
3. Look for startup messages:
```
β
Configuration loaded for HuggingFace Spaces
π TranscriptorAI Enterprise - LLM Backend: local
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
[Local Model] β
Model loaded on cuda:0
```
### Check Processing
During analysis, you should see:
```
[Local Model] Generating (1500 max tokens, temp=0.7)...
[Local Model] β
Generated 1247 characters
[LLM Debug] β
Successfully extracted JSON with 7 fields
```
---
## Cost Estimation
### Free Tier (CPU)
- β οΈ Very slow but free
- ~5-10 minutes per transcript
### GPU (T4) - ~$0.60/hour
- β‘ Fast processing
- ~30-60 seconds per transcript
- Space sleeps after inactivity (saves money)
### Persistent GPU (Upgraded)
- Always-on for instant access
- Higher cost but best user experience
---
## Security Notes
1. **No API Keys Needed:** Everything runs locally
2. **Private Processing:** Data never leaves your Space
3. **Secrets Management:** Use Spaces Secrets (not Variables) for sensitive data
4. **Model Access:** Phi-3 and most models don't require gated access
---
## Next Steps
1. β
Upload code to your Space
2. β
Select GPU hardware
3. β
Wait for first model download (~2-5 min)
4. β
Test with a sample transcript
5. π Share your Space URL!
---
## Support
- **HuggingFace Spaces Docs:** https://huggingface.co/docs/hub/spaces
- **Transformers Docs:** https://huggingface.co/docs/transformers
- **GPU Pricing:** https://huggingface.co/pricing
---
**Last Updated:** October 2025
|