TranscriptWriting / FIX_FOR_HF_SPACES.md
jmisak's picture
Upload 57 files
52d0298 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Fix for HuggingFace Spaces Timeout Issues

Problem: Spaces Timing Out During Model Loading/Summarization

HuggingFace Spaces has strict limitations:

  • CPU Basic: 2 vCPU, 16GB RAM, ~60 second timeout
  • CPU Upgraded: 8 vCPU, 32GB RAM, longer timeout
  • GPU: Better but limited availability

When loading large models or processing many transcripts, Spaces hits these limits.


βœ… IMMEDIATE FIXES FOR HF SPACES

Fix 1: Use HuggingFace Inference API (Not Local Models)

The issue is trying to load models ON the Space. Instead, use HF's API endpoints.

Edit config.py:

# CRITICAL: Use HF API, not local models
LLM_BACKEND = "hf_api"  # NOT "local"

# Use serverless inference (no model loading needed)
HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

# Reduce timeouts for Spaces limits
LLM_TIMEOUT = 30  # Spaces will kill longer requests
MAX_TOKENS_PER_REQUEST = 150  # Smaller = faster

Fix 2: Set HF Space Secrets

In your Space settings, add:

  1. Go to: Settings β†’ Repository secrets
  2. Add secret:

Fix 3: Reduce Memory Usage

Edit app.py - Process transcripts one at a time:

# Instead of processing all at once, batch them
MAX_TRANSCRIPTS_PER_BATCH = 3  # Process max 3 at a time

# Split files into batches
for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
    batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
    # Process batch...

Fix 4: Use Gradio's Queue System

In app.py, at the end:

# Enable queue to handle long-running tasks
demo.queue(
    concurrency_count=1,  # Process one at a time
    max_size=10,          # Max 10 in queue
    api_open=False
).launch()

πŸš€ OPTIMIZED CONFIG FOR HF SPACES

Create spaces_config.py:

import os

# HuggingFace Spaces Optimized Configuration
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
os.environ["OVERLAP_TOKENS"] = "50"

# Use serverless inference endpoints
os.environ["USE_SERVERLESS"] = "true"

Then import at the top of app.py:

import spaces_config  # Load before other imports

πŸ“ MODIFY FOR SPACES CONSTRAINTS

Change 1: Aggressive Chunking

In chunking.py, reduce chunk sizes:

# For Spaces, use smaller chunks
MAX_CHUNK_TOKENS = 2000  # Down from 6000
OVERLAP_TOKENS = 50       # Down from 150

Change 2: Streaming Progress

In app.py, add progress updates to prevent timeout appearance:

def analyze(files, ..., progress=gr.Progress()):
    for i, file in enumerate(files):
        # Update progress frequently
        progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")

        # Yield intermediate results to keep connection alive
        yield f"Processing {file.name}...", None, None, None

Change 3: Use @spaces.GPU Decorator (If Available)

If you have GPU access:

import spaces

@spaces.GPU(duration=60)  # Request GPU for 60 seconds
def analyze_with_gpu(files, ...):
    # Your analysis code
    pass

🎯 RECOMMENDED SPACE CONFIGURATION

In your Space's README.md header:

---
title: TranscriptorAI Enhanced
emoji: πŸ“
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
duplicated_from:
hardware: cpu-upgrade  # Or cpu-basic if budget constrained
---

Upgrade to CPU Upgrade or GPU for better performance:

  • hardware: cpu-upgrade - Better timeout limits
  • hardware: t4-small - GPU access (faster)

⚑ LIGHTWEIGHT SPACES VERSION

Create app_spaces.py (lightweight version):

import gradio as gr
import os

# Force lightweight mode for Spaces
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "20"

# Import after setting env vars
from app import analyze, generate_narrative_report_ui

# Simplified interface for Spaces
with gr.Blocks() as demo:
    gr.Markdown("# TranscriptorAI - HF Spaces Edition")
    gr.Markdown("⚠️ **Note**: Process 1-3 transcripts at a time to avoid timeouts")

    with gr.Tab("Analyze Transcripts"):
        with gr.Row():
            files = gr.File(
                label="Upload Transcripts (Max 3 files)",
                file_count="multiple",
                file_types=[".txt", ".docx", ".pdf"]
            )

        with gr.Row():
            file_type = gr.Radio(
                choices=["Auto-detect", "DOCX", "PDF", "TXT"],
                value="Auto-detect",
                label="File Type"
            )
            interviewee_type = gr.Radio(
                choices=["HCP", "Patient", "Other"],
                value="Patient",
                label="Interviewee Type"
            )

        analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")

        output = gr.Textbox(label="Analysis Results", lines=20)
        csv_output = gr.File(label="CSV Report")
        pdf_output = gr.File(label="PDF Report")

    analyze_btn.click(
        fn=analyze,
        inputs=[files, file_type, gr.Textbox(value="", visible=False),
                gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
                interviewee_type],
        outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
    )

# Critical for Spaces
demo.queue(concurrency_count=1).launch(
    server_name="0.0.0.0",  # Required for Spaces
    server_port=7860,        # Required for Spaces
    share=False
)

πŸ”§ SPACES-SPECIFIC REQUIREMENTS.TXT

Create minimal dependencies:

# Lightweight for HF Spaces
gradio>=4.0.0
huggingface_hub>=0.19.0
python-docx>=1.0.0
pdfplumber>=0.10.0
pandas>=2.0.0
reportlab>=4.0.0
tiktoken>=0.5.0

# Don't install heavy models locally
# transformers  # REMOVE - use API instead
# torch         # REMOVE - use API instead

πŸ“Š DEBUGGING SPACES TIMEOUTS

Check Spaces Logs

In your Space, click Logs to see:

Building Space...
Loading model...  ← If stuck here = model too large
Timeout after 60s ← Spaces limit hit

Add Logging

import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def analyze(...):
    logger.info("Starting analysis...")
    logger.info(f"Processing {len(files)} files")
    # ... rest of code

βœ… CHECKLIST FOR SPACES

  • Set LLM_BACKEND=hf_api (not local)
  • Add HUGGINGFACE_TOKEN secret in Space settings
  • Use lightweight model (Mistral-7B, not Mixtral-8x7B)
  • Enable demo.queue() for long tasks
  • Process max 3 transcripts at a time
  • Set LLM_TIMEOUT=25 (under Spaces limit)
  • Reduce MAX_TOKENS_PER_REQUEST=100
  • Add progress updates to prevent timeout appearance
  • Consider upgrading to cpu-upgrade or t4-small hardware

🎯 ULTIMATE SPACES FIX

The real issue is Spaces is timing out waiting for a response.

Quick Fix - Add this to the very top of app.py:

import os
import sys

# HuggingFace Spaces Configuration
# MUST be set before any other imports
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"

print("πŸš€ Running on HuggingFace Spaces")
print(f"πŸ“Š Backend: {os.environ['LLM_BACKEND']}")
print(f"πŸ€– Model: {os.environ['HF_MODEL']}")
print(f"⏱️  Timeout: {os.environ['LLM_TIMEOUT']}s")

And at the bottom of app.py, change .launch() to:

if __name__ == "__main__":
    demo.queue(
        concurrency_count=1,
        max_size=10,
        api_open=False
    ).launch(
        server_name="0.0.0.0",
        server_port=7860,
        show_error=True
    )

πŸ“ž If Still Timing Out

Option 1: Use Spaces Persistent Storage

# Store intermediate results
import pickle
cache_file = "/tmp/transcriptor_cache.pkl"

Option 2: Split Processing

Process in stages:

  1. Stage 1: Upload & extract text β†’ Save to temp
  2. Stage 2: Analyze saved text β†’ Return results

Option 3: Use Spaces Secrets for Larger Timeout

Upgrade to cpu-upgrade hardware in Space settings.


The key insight: You're not running locally, so no node.js to crash. The "timeout" is HuggingFace Spaces killing your app for taking too long.

Solution: Use HF API (serverless) instead of loading models in the Space.