Spaces:
Sleeping
Sleeping
File size: 9,184 Bytes
52d0298 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
# Fix for HuggingFace Spaces Timeout Issues
## Problem: Spaces Timing Out During Model Loading/Summarization
HuggingFace Spaces has strict limitations:
- **CPU Basic**: 2 vCPU, 16GB RAM, ~60 second timeout
- **CPU Upgraded**: 8 vCPU, 32GB RAM, longer timeout
- **GPU**: Better but limited availability
When loading large models or processing many transcripts, Spaces hits these limits.
---
## β
IMMEDIATE FIXES FOR HF SPACES
### Fix 1: Use HuggingFace Inference API (Not Local Models)
The issue is trying to load models ON the Space. Instead, use HF's API endpoints.
**Edit `config.py`:**
```python
# CRITICAL: Use HF API, not local models
LLM_BACKEND = "hf_api" # NOT "local"
# Use serverless inference (no model loading needed)
HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
# Reduce timeouts for Spaces limits
LLM_TIMEOUT = 30 # Spaces will kill longer requests
MAX_TOKENS_PER_REQUEST = 150 # Smaller = faster
```
### Fix 2: Set HF Space Secrets
In your Space settings, add:
1. Go to: `Settings` β `Repository secrets`
2. Add secret:
- Name: `HUGGINGFACE_TOKEN`
- Value: Your HF token from https://huggingface.co/settings/tokens
### Fix 3: Reduce Memory Usage
**Edit `app.py`** - Process transcripts one at a time:
```python
# Instead of processing all at once, batch them
MAX_TRANSCRIPTS_PER_BATCH = 3 # Process max 3 at a time
# Split files into batches
for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
# Process batch...
```
### Fix 4: Use Gradio's Queue System
**In `app.py`**, at the end:
```python
# Enable queue to handle long-running tasks
demo.queue(
concurrency_count=1, # Process one at a time
max_size=10, # Max 10 in queue
api_open=False
).launch()
```
---
## π OPTIMIZED CONFIG FOR HF SPACES
Create `spaces_config.py`:
```python
import os
# HuggingFace Spaces Optimized Configuration
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
os.environ["OVERLAP_TOKENS"] = "50"
# Use serverless inference endpoints
os.environ["USE_SERVERLESS"] = "true"
```
Then import at the top of `app.py`:
```python
import spaces_config # Load before other imports
```
---
## π MODIFY FOR SPACES CONSTRAINTS
### Change 1: Aggressive Chunking
**In `chunking.py`**, reduce chunk sizes:
```python
# For Spaces, use smaller chunks
MAX_CHUNK_TOKENS = 2000 # Down from 6000
OVERLAP_TOKENS = 50 # Down from 150
```
### Change 2: Streaming Progress
**In `app.py`**, add progress updates to prevent timeout appearance:
```python
def analyze(files, ..., progress=gr.Progress()):
for i, file in enumerate(files):
# Update progress frequently
progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")
# Yield intermediate results to keep connection alive
yield f"Processing {file.name}...", None, None, None
```
### Change 3: Use @spaces.GPU Decorator (If Available)
If you have GPU access:
```python
import spaces
@spaces.GPU(duration=60) # Request GPU for 60 seconds
def analyze_with_gpu(files, ...):
# Your analysis code
pass
```
---
## π― RECOMMENDED SPACE CONFIGURATION
**In your Space's `README.md` header:**
```yaml
---
title: TranscriptorAI Enhanced
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
duplicated_from:
hardware: cpu-upgrade # Or cpu-basic if budget constrained
---
```
**Upgrade to CPU Upgrade or GPU** for better performance:
- `hardware: cpu-upgrade` - Better timeout limits
- `hardware: t4-small` - GPU access (faster)
---
## β‘ LIGHTWEIGHT SPACES VERSION
Create `app_spaces.py` (lightweight version):
```python
import gradio as gr
import os
# Force lightweight mode for Spaces
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "20"
# Import after setting env vars
from app import analyze, generate_narrative_report_ui
# Simplified interface for Spaces
with gr.Blocks() as demo:
gr.Markdown("# TranscriptorAI - HF Spaces Edition")
gr.Markdown("β οΈ **Note**: Process 1-3 transcripts at a time to avoid timeouts")
with gr.Tab("Analyze Transcripts"):
with gr.Row():
files = gr.File(
label="Upload Transcripts (Max 3 files)",
file_count="multiple",
file_types=[".txt", ".docx", ".pdf"]
)
with gr.Row():
file_type = gr.Radio(
choices=["Auto-detect", "DOCX", "PDF", "TXT"],
value="Auto-detect",
label="File Type"
)
interviewee_type = gr.Radio(
choices=["HCP", "Patient", "Other"],
value="Patient",
label="Interviewee Type"
)
analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")
output = gr.Textbox(label="Analysis Results", lines=20)
csv_output = gr.File(label="CSV Report")
pdf_output = gr.File(label="PDF Report")
analyze_btn.click(
fn=analyze,
inputs=[files, file_type, gr.Textbox(value="", visible=False),
gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
interviewee_type],
outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
)
# Critical for Spaces
demo.queue(concurrency_count=1).launch(
server_name="0.0.0.0", # Required for Spaces
server_port=7860, # Required for Spaces
share=False
)
```
---
## π§ SPACES-SPECIFIC REQUIREMENTS.TXT
Create minimal dependencies:
```txt
# Lightweight for HF Spaces
gradio>=4.0.0
huggingface_hub>=0.19.0
python-docx>=1.0.0
pdfplumber>=0.10.0
pandas>=2.0.0
reportlab>=4.0.0
tiktoken>=0.5.0
# Don't install heavy models locally
# transformers # REMOVE - use API instead
# torch # REMOVE - use API instead
```
---
## π DEBUGGING SPACES TIMEOUTS
### Check Spaces Logs
In your Space, click `Logs` to see:
```
Building Space...
Loading model... β If stuck here = model too large
Timeout after 60s β Spaces limit hit
```
### Add Logging
```python
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def analyze(...):
logger.info("Starting analysis...")
logger.info(f"Processing {len(files)} files")
# ... rest of code
```
---
## β
CHECKLIST FOR SPACES
- [ ] Set `LLM_BACKEND=hf_api` (not `local`)
- [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings
- [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B)
- [ ] Enable `demo.queue()` for long tasks
- [ ] Process max 3 transcripts at a time
- [ ] Set `LLM_TIMEOUT=25` (under Spaces limit)
- [ ] Reduce `MAX_TOKENS_PER_REQUEST=100`
- [ ] Add progress updates to prevent timeout appearance
- [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware
---
## π― ULTIMATE SPACES FIX
The real issue is **Spaces is timing out waiting for a response**.
**Quick Fix - Add this to the very top of `app.py`:**
```python
import os
import sys
# HuggingFace Spaces Configuration
# MUST be set before any other imports
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
print("π Running on HuggingFace Spaces")
print(f"π Backend: {os.environ['LLM_BACKEND']}")
print(f"π€ Model: {os.environ['HF_MODEL']}")
print(f"β±οΈ Timeout: {os.environ['LLM_TIMEOUT']}s")
```
**And at the bottom of `app.py`, change `.launch()` to:**
```python
if __name__ == "__main__":
demo.queue(
concurrency_count=1,
max_size=10,
api_open=False
).launch(
server_name="0.0.0.0",
server_port=7860,
show_error=True
)
```
---
## π If Still Timing Out
### Option 1: Use Spaces Persistent Storage
```python
# Store intermediate results
import pickle
cache_file = "/tmp/transcriptor_cache.pkl"
```
### Option 2: Split Processing
Process in stages:
1. Stage 1: Upload & extract text β Save to temp
2. Stage 2: Analyze saved text β Return results
### Option 3: Use Spaces Secrets for Larger Timeout
Upgrade to `cpu-upgrade` hardware in Space settings.
---
**The key insight**: You're not running locally, so no node.js to crash.
The "timeout" is HuggingFace Spaces killing your app for taking too long.
**Solution**: Use HF API (serverless) instead of loading models in the Space.
|