TranscriptWriting / FIX_FOR_HF_SPACES.md
jmisak's picture
Upload 57 files
52d0298 verified
# Fix for HuggingFace Spaces Timeout Issues
## Problem: Spaces Timing Out During Model Loading/Summarization
HuggingFace Spaces has strict limitations:
- **CPU Basic**: 2 vCPU, 16GB RAM, ~60 second timeout
- **CPU Upgraded**: 8 vCPU, 32GB RAM, longer timeout
- **GPU**: Better but limited availability
When loading large models or processing many transcripts, Spaces hits these limits.
---
## βœ… IMMEDIATE FIXES FOR HF SPACES
### Fix 1: Use HuggingFace Inference API (Not Local Models)
The issue is trying to load models ON the Space. Instead, use HF's API endpoints.
**Edit `config.py`:**
```python
# CRITICAL: Use HF API, not local models
LLM_BACKEND = "hf_api" # NOT "local"
# Use serverless inference (no model loading needed)
HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
# Reduce timeouts for Spaces limits
LLM_TIMEOUT = 30 # Spaces will kill longer requests
MAX_TOKENS_PER_REQUEST = 150 # Smaller = faster
```
### Fix 2: Set HF Space Secrets
In your Space settings, add:
1. Go to: `Settings` β†’ `Repository secrets`
2. Add secret:
- Name: `HUGGINGFACE_TOKEN`
- Value: Your HF token from https://huggingface.co/settings/tokens
### Fix 3: Reduce Memory Usage
**Edit `app.py`** - Process transcripts one at a time:
```python
# Instead of processing all at once, batch them
MAX_TRANSCRIPTS_PER_BATCH = 3 # Process max 3 at a time
# Split files into batches
for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
# Process batch...
```
### Fix 4: Use Gradio's Queue System
**In `app.py`**, at the end:
```python
# Enable queue to handle long-running tasks
demo.queue(
concurrency_count=1, # Process one at a time
max_size=10, # Max 10 in queue
api_open=False
).launch()
```
---
## πŸš€ OPTIMIZED CONFIG FOR HF SPACES
Create `spaces_config.py`:
```python
import os
# HuggingFace Spaces Optimized Configuration
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
os.environ["OVERLAP_TOKENS"] = "50"
# Use serverless inference endpoints
os.environ["USE_SERVERLESS"] = "true"
```
Then import at the top of `app.py`:
```python
import spaces_config # Load before other imports
```
---
## πŸ“ MODIFY FOR SPACES CONSTRAINTS
### Change 1: Aggressive Chunking
**In `chunking.py`**, reduce chunk sizes:
```python
# For Spaces, use smaller chunks
MAX_CHUNK_TOKENS = 2000 # Down from 6000
OVERLAP_TOKENS = 50 # Down from 150
```
### Change 2: Streaming Progress
**In `app.py`**, add progress updates to prevent timeout appearance:
```python
def analyze(files, ..., progress=gr.Progress()):
for i, file in enumerate(files):
# Update progress frequently
progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")
# Yield intermediate results to keep connection alive
yield f"Processing {file.name}...", None, None, None
```
### Change 3: Use @spaces.GPU Decorator (If Available)
If you have GPU access:
```python
import spaces
@spaces.GPU(duration=60) # Request GPU for 60 seconds
def analyze_with_gpu(files, ...):
# Your analysis code
pass
```
---
## 🎯 RECOMMENDED SPACE CONFIGURATION
**In your Space's `README.md` header:**
```yaml
---
title: TranscriptorAI Enhanced
emoji: πŸ“
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
duplicated_from:
hardware: cpu-upgrade # Or cpu-basic if budget constrained
---
```
**Upgrade to CPU Upgrade or GPU** for better performance:
- `hardware: cpu-upgrade` - Better timeout limits
- `hardware: t4-small` - GPU access (faster)
---
## ⚑ LIGHTWEIGHT SPACES VERSION
Create `app_spaces.py` (lightweight version):
```python
import gradio as gr
import os
# Force lightweight mode for Spaces
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "20"
# Import after setting env vars
from app import analyze, generate_narrative_report_ui
# Simplified interface for Spaces
with gr.Blocks() as demo:
gr.Markdown("# TranscriptorAI - HF Spaces Edition")
gr.Markdown("⚠️ **Note**: Process 1-3 transcripts at a time to avoid timeouts")
with gr.Tab("Analyze Transcripts"):
with gr.Row():
files = gr.File(
label="Upload Transcripts (Max 3 files)",
file_count="multiple",
file_types=[".txt", ".docx", ".pdf"]
)
with gr.Row():
file_type = gr.Radio(
choices=["Auto-detect", "DOCX", "PDF", "TXT"],
value="Auto-detect",
label="File Type"
)
interviewee_type = gr.Radio(
choices=["HCP", "Patient", "Other"],
value="Patient",
label="Interviewee Type"
)
analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")
output = gr.Textbox(label="Analysis Results", lines=20)
csv_output = gr.File(label="CSV Report")
pdf_output = gr.File(label="PDF Report")
analyze_btn.click(
fn=analyze,
inputs=[files, file_type, gr.Textbox(value="", visible=False),
gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
interviewee_type],
outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
)
# Critical for Spaces
demo.queue(concurrency_count=1).launch(
server_name="0.0.0.0", # Required for Spaces
server_port=7860, # Required for Spaces
share=False
)
```
---
## πŸ”§ SPACES-SPECIFIC REQUIREMENTS.TXT
Create minimal dependencies:
```txt
# Lightweight for HF Spaces
gradio>=4.0.0
huggingface_hub>=0.19.0
python-docx>=1.0.0
pdfplumber>=0.10.0
pandas>=2.0.0
reportlab>=4.0.0
tiktoken>=0.5.0
# Don't install heavy models locally
# transformers # REMOVE - use API instead
# torch # REMOVE - use API instead
```
---
## πŸ“Š DEBUGGING SPACES TIMEOUTS
### Check Spaces Logs
In your Space, click `Logs` to see:
```
Building Space...
Loading model... ← If stuck here = model too large
Timeout after 60s ← Spaces limit hit
```
### Add Logging
```python
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def analyze(...):
logger.info("Starting analysis...")
logger.info(f"Processing {len(files)} files")
# ... rest of code
```
---
## βœ… CHECKLIST FOR SPACES
- [ ] Set `LLM_BACKEND=hf_api` (not `local`)
- [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings
- [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B)
- [ ] Enable `demo.queue()` for long tasks
- [ ] Process max 3 transcripts at a time
- [ ] Set `LLM_TIMEOUT=25` (under Spaces limit)
- [ ] Reduce `MAX_TOKENS_PER_REQUEST=100`
- [ ] Add progress updates to prevent timeout appearance
- [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware
---
## 🎯 ULTIMATE SPACES FIX
The real issue is **Spaces is timing out waiting for a response**.
**Quick Fix - Add this to the very top of `app.py`:**
```python
import os
import sys
# HuggingFace Spaces Configuration
# MUST be set before any other imports
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
print("πŸš€ Running on HuggingFace Spaces")
print(f"πŸ“Š Backend: {os.environ['LLM_BACKEND']}")
print(f"πŸ€– Model: {os.environ['HF_MODEL']}")
print(f"⏱️ Timeout: {os.environ['LLM_TIMEOUT']}s")
```
**And at the bottom of `app.py`, change `.launch()` to:**
```python
if __name__ == "__main__":
demo.queue(
concurrency_count=1,
max_size=10,
api_open=False
).launch(
server_name="0.0.0.0",
server_port=7860,
show_error=True
)
```
---
## πŸ“ž If Still Timing Out
### Option 1: Use Spaces Persistent Storage
```python
# Store intermediate results
import pickle
cache_file = "/tmp/transcriptor_cache.pkl"
```
### Option 2: Split Processing
Process in stages:
1. Stage 1: Upload & extract text β†’ Save to temp
2. Stage 2: Analyze saved text β†’ Return results
### Option 3: Use Spaces Secrets for Larger Timeout
Upgrade to `cpu-upgrade` hardware in Space settings.
---
**The key insight**: You're not running locally, so no node.js to crash.
The "timeout" is HuggingFace Spaces killing your app for taking too long.
**Solution**: Use HF API (serverless) instead of loading models in the Space.