# Fix for HuggingFace Spaces Timeout Issues

## Problem: Spaces Timing Out During Model Loading/Summarization

HuggingFace Spaces has strict limitations:
- **CPU Basic**: 2 vCPU, 16GB RAM, ~60 second timeout
- **CPU Upgraded**: 8 vCPU, 32GB RAM, longer timeout
- **GPU**: Better but limited availability

When loading large models or processing many transcripts, Spaces hits these limits.

---

## ✅ IMMEDIATE FIXES FOR HF SPACES

### Fix 1: Use HuggingFace Inference API (Not Local Models)

The issue is trying to load models ON the Space. Instead, use HF's API endpoints.

**Edit `config.py`:**

```python
# CRITICAL: Use HF API, not local models
LLM_BACKEND = "hf_api"  # NOT "local"

# Use serverless inference (no model loading needed)
HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

# Reduce timeouts for Spaces limits
LLM_TIMEOUT = 30  # Spaces will kill longer requests
MAX_TOKENS_PER_REQUEST = 150  # Smaller = faster
```

### Fix 2: Set HF Space Secrets

In your Space settings, add:

1. Go to: `Settings` → `Repository secrets`
2. Add secret:
   - Name: `HUGGINGFACE_TOKEN`
   - Value: Your HF token from https://huggingface.co/settings/tokens

### Fix 3: Reduce Memory Usage

**Edit `app.py`** - Process transcripts one at a time:

```python
# Instead of processing all at once, batch them
MAX_TRANSCRIPTS_PER_BATCH = 3  # Process max 3 at a time

# Split files into batches
for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
    batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
    # Process batch...
```

### Fix 4: Use Gradio's Queue System

**In `app.py`**, at the end:

```python
# Enable queue to handle long-running tasks
demo.queue(
    concurrency_count=1,  # Process one at a time
    max_size=10,          # Max 10 in queue
    api_open=False
).launch()
```

---

## 🚀 OPTIMIZED CONFIG FOR HF SPACES

Create `spaces_config.py`:

```python
import os

# HuggingFace Spaces Optimized Configuration
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
os.environ["OVERLAP_TOKENS"] = "50"

# Use serverless inference endpoints
os.environ["USE_SERVERLESS"] = "true"
```

Then import at the top of `app.py`:
```python
import spaces_config  # Load before other imports
```

---

## 📝 MODIFY FOR SPACES CONSTRAINTS

### Change 1: Aggressive Chunking

**In `chunking.py`**, reduce chunk sizes:

```python
# For Spaces, use smaller chunks
MAX_CHUNK_TOKENS = 2000  # Down from 6000
OVERLAP_TOKENS = 50       # Down from 150
```

### Change 2: Streaming Progress

**In `app.py`**, add progress updates to prevent timeout appearance:

```python
def analyze(files, ..., progress=gr.Progress()):
    for i, file in enumerate(files):
        # Update progress frequently
        progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")

        # Yield intermediate results to keep connection alive
        yield f"Processing {file.name}...", None, None, None
```

### Change 3: Use @spaces.GPU Decorator (If Available)

If you have GPU access:

```python
import spaces

@spaces.GPU(duration=60)  # Request GPU for 60 seconds
def analyze_with_gpu(files, ...):
    # Your analysis code
    pass
```

---

## 🎯 RECOMMENDED SPACE CONFIGURATION

**In your Space's `README.md` header:**

```yaml
---
title: TranscriptorAI Enhanced
emoji: 📝
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
duplicated_from:
hardware: cpu-upgrade  # Or cpu-basic if budget constrained
---
```

**Upgrade to CPU Upgrade or GPU** for better performance:
- `hardware: cpu-upgrade` - Better timeout limits
- `hardware: t4-small` - GPU access (faster)

---

## ⚡ LIGHTWEIGHT SPACES VERSION

Create `app_spaces.py` (lightweight version):

```python
import gradio as gr
import os

# Force lightweight mode for Spaces
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "20"

# Import after setting env vars
from app import analyze, generate_narrative_report_ui

# Simplified interface for Spaces
with gr.Blocks() as demo:
    gr.Markdown("# TranscriptorAI - HF Spaces Edition")
    gr.Markdown("⚠️ **Note**: Process 1-3 transcripts at a time to avoid timeouts")

    with gr.Tab("Analyze Transcripts"):
        with gr.Row():
            files = gr.File(
                label="Upload Transcripts (Max 3 files)",
                file_count="multiple",
                file_types=[".txt", ".docx", ".pdf"]
            )

        with gr.Row():
            file_type = gr.Radio(
                choices=["Auto-detect", "DOCX", "PDF", "TXT"],
                value="Auto-detect",
                label="File Type"
            )
            interviewee_type = gr.Radio(
                choices=["HCP", "Patient", "Other"],
                value="Patient",
                label="Interviewee Type"
            )

        analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")

        output = gr.Textbox(label="Analysis Results", lines=20)
        csv_output = gr.File(label="CSV Report")
        pdf_output = gr.File(label="PDF Report")

    analyze_btn.click(
        fn=analyze,
        inputs=[files, file_type, gr.Textbox(value="", visible=False),
                gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
                interviewee_type],
        outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
    )

# Critical for Spaces
demo.queue(concurrency_count=1).launch(
    server_name="0.0.0.0",  # Required for Spaces
    server_port=7860,        # Required for Spaces
    share=False
)
```

---

## 🔧 SPACES-SPECIFIC REQUIREMENTS.TXT

Create minimal dependencies:

```txt
# Lightweight for HF Spaces
gradio>=4.0.0
huggingface_hub>=0.19.0
python-docx>=1.0.0
pdfplumber>=0.10.0
pandas>=2.0.0
reportlab>=4.0.0
tiktoken>=0.5.0

# Don't install heavy models locally
# transformers  # REMOVE - use API instead
# torch         # REMOVE - use API instead
```

---

## 📊 DEBUGGING SPACES TIMEOUTS

### Check Spaces Logs

In your Space, click `Logs` to see:
```
Building Space...
Loading model...  ← If stuck here = model too large
Timeout after 60s ← Spaces limit hit
```

### Add Logging

```python
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def analyze(...):
    logger.info("Starting analysis...")
    logger.info(f"Processing {len(files)} files")
    # ... rest of code
```

---

## ✅ CHECKLIST FOR SPACES

- [ ] Set `LLM_BACKEND=hf_api` (not `local`)
- [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings
- [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B)
- [ ] Enable `demo.queue()` for long tasks
- [ ] Process max 3 transcripts at a time
- [ ] Set `LLM_TIMEOUT=25` (under Spaces limit)
- [ ] Reduce `MAX_TOKENS_PER_REQUEST=100`
- [ ] Add progress updates to prevent timeout appearance
- [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware

---

## 🎯 ULTIMATE SPACES FIX

The real issue is **Spaces is timing out waiting for a response**.

**Quick Fix - Add this to the very top of `app.py`:**

```python
import os
import sys

# HuggingFace Spaces Configuration
# MUST be set before any other imports
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"

print("🚀 Running on HuggingFace Spaces")
print(f"📊 Backend: {os.environ['LLM_BACKEND']}")
print(f"🤖 Model: {os.environ['HF_MODEL']}")
print(f"⏱️  Timeout: {os.environ['LLM_TIMEOUT']}s")
```

**And at the bottom of `app.py`, change `.launch()` to:**

```python
if __name__ == "__main__":
    demo.queue(
        concurrency_count=1,
        max_size=10,
        api_open=False
    ).launch(
        server_name="0.0.0.0",
        server_port=7860,
        show_error=True
    )
```

---

## 📞 If Still Timing Out

### Option 1: Use Spaces Persistent Storage
```python
# Store intermediate results
import pickle
cache_file = "/tmp/transcriptor_cache.pkl"
```

### Option 2: Split Processing
Process in stages:
1. Stage 1: Upload & extract text → Save to temp
2. Stage 2: Analyze saved text → Return results

### Option 3: Use Spaces Secrets for Larger Timeout
Upgrade to `cpu-upgrade` hardware in Space settings.

---

**The key insight**: You're not running locally, so no node.js to crash.
The "timeout" is HuggingFace Spaces killing your app for taking too long.

**Solution**: Use HF API (serverless) instead of loading models in the Space.