Spaces:

empirenexus
/

TranscriptWriting

Sleeping

File size: 9,184 Bytes

52d0298

# Fix for HuggingFace Spaces Timeout Issues

## Problem: Spaces Timing Out During Model Loading/Summarization

HuggingFace Spaces has strict limitations:
- **CPU Basic**: 2 vCPU, 16GB RAM, ~60 second timeout
- **CPU Upgraded**: 8 vCPU, 32GB RAM, longer timeout
- **GPU**: Better but limited availability

When loading large models or processing many transcripts, Spaces hits these limits.

---

## ✅ IMMEDIATE FIXES FOR HF SPACES

### Fix 1: Use HuggingFace Inference API (Not Local Models)

The issue is trying to load models ON the Space. Instead, use HF's API endpoints.

**Edit `config.py`:**

```python

# CRITICAL: Use HF API, not local models

LLM_BACKEND = "hf_api"  # NOT "local"



# Use serverless inference (no model loading needed)

HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"



# Reduce timeouts for Spaces limits

LLM_TIMEOUT = 30  # Spaces will kill longer requests

MAX_TOKENS_PER_REQUEST = 150  # Smaller = faster

```

### Fix 2: Set HF Space Secrets

In your Space settings, add:

1. Go to: `Settings` → `Repository secrets`
2. Add secret:
   - Name: `HUGGINGFACE_TOKEN`
   - Value: Your HF token from https://huggingface.co/settings/tokens

### Fix 3: Reduce Memory Usage

**Edit `app.py`** - Process transcripts one at a time:

```python

# Instead of processing all at once, batch them

MAX_TRANSCRIPTS_PER_BATCH = 3  # Process max 3 at a time



# Split files into batches

for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):

    batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]

    # Process batch...

```

### Fix 4: Use Gradio's Queue System

**In `app.py`**, at the end:

```python

# Enable queue to handle long-running tasks

demo.queue(

    concurrency_count=1,  # Process one at a time

    max_size=10,          # Max 10 in queue

    api_open=False

).launch()

```

---

## 🚀 OPTIMIZED CONFIG FOR HF SPACES

Create `spaces_config.py`:

```python

import os



# HuggingFace Spaces Optimized Configuration

os.environ["LLM_BACKEND"] = "hf_api"

os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"

os.environ["MAX_TOKENS_PER_REQUEST"] = "100"

os.environ["LLM_TIMEOUT"] = "25"

os.environ["MAX_CHUNK_TOKENS"] = "2000"

os.environ["OVERLAP_TOKENS"] = "50"



# Use serverless inference endpoints

os.environ["USE_SERVERLESS"] = "true"

```

Then import at the top of `app.py`:
```python

import spaces_config  # Load before other imports

```

---

## 📝 MODIFY FOR SPACES CONSTRAINTS

### Change 1: Aggressive Chunking

**In `chunking.py`**, reduce chunk sizes:

```python

# For Spaces, use smaller chunks

MAX_CHUNK_TOKENS = 2000  # Down from 6000

OVERLAP_TOKENS = 50       # Down from 150

```

### Change 2: Streaming Progress

**In `app.py`**, add progress updates to prevent timeout appearance:

```python

def analyze(files, ..., progress=gr.Progress()):

    for i, file in enumerate(files):

        # Update progress frequently

        progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")



        # Yield intermediate results to keep connection alive

        yield f"Processing {file.name}...", None, None, None

```

### Change 3: Use @spaces.GPU Decorator (If Available)

If you have GPU access:

```python

import spaces



@spaces.GPU(duration=60)  # Request GPU for 60 seconds

def analyze_with_gpu(files, ...):

    # Your analysis code

    pass

```

---

## 🎯 RECOMMENDED SPACE CONFIGURATION

**In your Space's `README.md` header:**

```yaml

---

title: TranscriptorAI Enhanced

emoji: 📝

colorFrom: blue

colorTo: green

sdk: gradio

sdk_version: 4.0.0

app_file: app.py

pinned: false

license: mit

duplicated_from:

hardware: cpu-upgrade  # Or cpu-basic if budget constrained

---

```

**Upgrade to CPU Upgrade or GPU** for better performance:
- `hardware: cpu-upgrade` - Better timeout limits
- `hardware: t4-small` - GPU access (faster)

---

## ⚡ LIGHTWEIGHT SPACES VERSION

Create `app_spaces.py` (lightweight version):

```python

import gradio as gr

import os



# Force lightweight mode for Spaces

os.environ["LLM_BACKEND"] = "hf_api"

os.environ["MAX_TOKENS_PER_REQUEST"] = "100"

os.environ["LLM_TIMEOUT"] = "20"



# Import after setting env vars

from app import analyze, generate_narrative_report_ui



# Simplified interface for Spaces

with gr.Blocks() as demo:

    gr.Markdown("# TranscriptorAI - HF Spaces Edition")

    gr.Markdown("⚠️ **Note**: Process 1-3 transcripts at a time to avoid timeouts")



    with gr.Tab("Analyze Transcripts"):

        with gr.Row():

            files = gr.File(

                label="Upload Transcripts (Max 3 files)",

                file_count="multiple",

                file_types=[".txt", ".docx", ".pdf"]

            )



        with gr.Row():

            file_type = gr.Radio(

                choices=["Auto-detect", "DOCX", "PDF", "TXT"],

                value="Auto-detect",

                label="File Type"

            )

            interviewee_type = gr.Radio(

                choices=["HCP", "Patient", "Other"],

                value="Patient",

                label="Interviewee Type"

            )



        analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")



        output = gr.Textbox(label="Analysis Results", lines=20)

        csv_output = gr.File(label="CSV Report")

        pdf_output = gr.File(label="PDF Report")



    analyze_btn.click(

        fn=analyze,

        inputs=[files, file_type, gr.Textbox(value="", visible=False),

                gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),

                interviewee_type],

        outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]

    )



# Critical for Spaces

demo.queue(concurrency_count=1).launch(

    server_name="0.0.0.0",  # Required for Spaces

    server_port=7860,        # Required for Spaces

    share=False

)

```

---

## 🔧 SPACES-SPECIFIC REQUIREMENTS.TXT

Create minimal dependencies:

```txt

# Lightweight for HF Spaces

gradio>=4.0.0

huggingface_hub>=0.19.0

python-docx>=1.0.0

pdfplumber>=0.10.0

pandas>=2.0.0

reportlab>=4.0.0

tiktoken>=0.5.0



# Don't install heavy models locally

# transformers  # REMOVE - use API instead

# torch         # REMOVE - use API instead

```

---

## 📊 DEBUGGING SPACES TIMEOUTS

### Check Spaces Logs

In your Space, click `Logs` to see:
```

Building Space...

Loading model...  ← If stuck here = model too large

Timeout after 60s ← Spaces limit hit

```

### Add Logging

```python

import logging

logging.basicConfig(level=logging.INFO)

logger = logging.getLogger(__name__)



def analyze(...):

    logger.info("Starting analysis...")

    logger.info(f"Processing {len(files)} files")

    # ... rest of code

```

---

## ✅ CHECKLIST FOR SPACES

- [ ] Set `LLM_BACKEND=hf_api` (not `local`)
- [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings
- [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B)
- [ ] Enable `demo.queue()` for long tasks
- [ ] Process max 3 transcripts at a time
- [ ] Set `LLM_TIMEOUT=25` (under Spaces limit)
- [ ] Reduce `MAX_TOKENS_PER_REQUEST=100`
- [ ] Add progress updates to prevent timeout appearance
- [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware

---

## 🎯 ULTIMATE SPACES FIX

The real issue is **Spaces is timing out waiting for a response**.

**Quick Fix - Add this to the very top of `app.py`:**

```python

import os

import sys



# HuggingFace Spaces Configuration

# MUST be set before any other imports

os.environ["LLM_BACKEND"] = "hf_api"

os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")

os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"

os.environ["MAX_TOKENS_PER_REQUEST"] = "100"

os.environ["LLM_TIMEOUT"] = "25"

os.environ["MAX_CHUNK_TOKENS"] = "2000"



print("🚀 Running on HuggingFace Spaces")

print(f"📊 Backend: {os.environ['LLM_BACKEND']}")

print(f"🤖 Model: {os.environ['HF_MODEL']}")

print(f"⏱️  Timeout: {os.environ['LLM_TIMEOUT']}s")

```

**And at the bottom of `app.py`, change `.launch()` to:**

```python

if __name__ == "__main__":

    demo.queue(

        concurrency_count=1,

        max_size=10,

        api_open=False

    ).launch(

        server_name="0.0.0.0",

        server_port=7860,

        show_error=True

    )

```

---

## 📞 If Still Timing Out

### Option 1: Use Spaces Persistent Storage
```python

# Store intermediate results

import pickle

cache_file = "/tmp/transcriptor_cache.pkl"

```

### Option 2: Split Processing
Process in stages:
1. Stage 1: Upload & extract text → Save to temp
2. Stage 2: Analyze saved text → Return results

### Option 3: Use Spaces Secrets for Larger Timeout
Upgrade to `cpu-upgrade` hardware in Space settings.

---

**The key insight**: You're not running locally, so no node.js to crash.
The "timeout" is HuggingFace Spaces killing your app for taking too long.

**Solution**: Use HF API (serverless) instead of loading models in the Space.