Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
Fix for HuggingFace Spaces Timeout Issues
Problem: Spaces Timing Out During Model Loading/Summarization
HuggingFace Spaces has strict limitations:
- CPU Basic: 2 vCPU, 16GB RAM, ~60 second timeout
- CPU Upgraded: 8 vCPU, 32GB RAM, longer timeout
- GPU: Better but limited availability
When loading large models or processing many transcripts, Spaces hits these limits.
β IMMEDIATE FIXES FOR HF SPACES
Fix 1: Use HuggingFace Inference API (Not Local Models)
The issue is trying to load models ON the Space. Instead, use HF's API endpoints.
Edit config.py:
# CRITICAL: Use HF API, not local models
LLM_BACKEND = "hf_api" # NOT "local"
# Use serverless inference (no model loading needed)
HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
# Reduce timeouts for Spaces limits
LLM_TIMEOUT = 30 # Spaces will kill longer requests
MAX_TOKENS_PER_REQUEST = 150 # Smaller = faster
Fix 2: Set HF Space Secrets
In your Space settings, add:
- Go to:
SettingsβRepository secrets - Add secret:
- Name:
HUGGINGFACE_TOKEN - Value: Your HF token from https://huggingface.co/settings/tokens
- Name:
Fix 3: Reduce Memory Usage
Edit app.py - Process transcripts one at a time:
# Instead of processing all at once, batch them
MAX_TRANSCRIPTS_PER_BATCH = 3 # Process max 3 at a time
# Split files into batches
for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
# Process batch...
Fix 4: Use Gradio's Queue System
In app.py, at the end:
# Enable queue to handle long-running tasks
demo.queue(
concurrency_count=1, # Process one at a time
max_size=10, # Max 10 in queue
api_open=False
).launch()
π OPTIMIZED CONFIG FOR HF SPACES
Create spaces_config.py:
import os
# HuggingFace Spaces Optimized Configuration
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
os.environ["OVERLAP_TOKENS"] = "50"
# Use serverless inference endpoints
os.environ["USE_SERVERLESS"] = "true"
Then import at the top of app.py:
import spaces_config # Load before other imports
π MODIFY FOR SPACES CONSTRAINTS
Change 1: Aggressive Chunking
In chunking.py, reduce chunk sizes:
# For Spaces, use smaller chunks
MAX_CHUNK_TOKENS = 2000 # Down from 6000
OVERLAP_TOKENS = 50 # Down from 150
Change 2: Streaming Progress
In app.py, add progress updates to prevent timeout appearance:
def analyze(files, ..., progress=gr.Progress()):
for i, file in enumerate(files):
# Update progress frequently
progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")
# Yield intermediate results to keep connection alive
yield f"Processing {file.name}...", None, None, None
Change 3: Use @spaces.GPU Decorator (If Available)
If you have GPU access:
import spaces
@spaces.GPU(duration=60) # Request GPU for 60 seconds
def analyze_with_gpu(files, ...):
# Your analysis code
pass
π― RECOMMENDED SPACE CONFIGURATION
In your Space's README.md header:
---
title: TranscriptorAI Enhanced
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
duplicated_from:
hardware: cpu-upgrade # Or cpu-basic if budget constrained
---
Upgrade to CPU Upgrade or GPU for better performance:
hardware: cpu-upgrade- Better timeout limitshardware: t4-small- GPU access (faster)
β‘ LIGHTWEIGHT SPACES VERSION
Create app_spaces.py (lightweight version):
import gradio as gr
import os
# Force lightweight mode for Spaces
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "20"
# Import after setting env vars
from app import analyze, generate_narrative_report_ui
# Simplified interface for Spaces
with gr.Blocks() as demo:
gr.Markdown("# TranscriptorAI - HF Spaces Edition")
gr.Markdown("β οΈ **Note**: Process 1-3 transcripts at a time to avoid timeouts")
with gr.Tab("Analyze Transcripts"):
with gr.Row():
files = gr.File(
label="Upload Transcripts (Max 3 files)",
file_count="multiple",
file_types=[".txt", ".docx", ".pdf"]
)
with gr.Row():
file_type = gr.Radio(
choices=["Auto-detect", "DOCX", "PDF", "TXT"],
value="Auto-detect",
label="File Type"
)
interviewee_type = gr.Radio(
choices=["HCP", "Patient", "Other"],
value="Patient",
label="Interviewee Type"
)
analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")
output = gr.Textbox(label="Analysis Results", lines=20)
csv_output = gr.File(label="CSV Report")
pdf_output = gr.File(label="PDF Report")
analyze_btn.click(
fn=analyze,
inputs=[files, file_type, gr.Textbox(value="", visible=False),
gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
interviewee_type],
outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
)
# Critical for Spaces
demo.queue(concurrency_count=1).launch(
server_name="0.0.0.0", # Required for Spaces
server_port=7860, # Required for Spaces
share=False
)
π§ SPACES-SPECIFIC REQUIREMENTS.TXT
Create minimal dependencies:
# Lightweight for HF Spaces
gradio>=4.0.0
huggingface_hub>=0.19.0
python-docx>=1.0.0
pdfplumber>=0.10.0
pandas>=2.0.0
reportlab>=4.0.0
tiktoken>=0.5.0
# Don't install heavy models locally
# transformers # REMOVE - use API instead
# torch # REMOVE - use API instead
π DEBUGGING SPACES TIMEOUTS
Check Spaces Logs
In your Space, click Logs to see:
Building Space...
Loading model... β If stuck here = model too large
Timeout after 60s β Spaces limit hit
Add Logging
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def analyze(...):
logger.info("Starting analysis...")
logger.info(f"Processing {len(files)} files")
# ... rest of code
β CHECKLIST FOR SPACES
- Set
LLM_BACKEND=hf_api(notlocal) - Add
HUGGINGFACE_TOKENsecret in Space settings - Use lightweight model (Mistral-7B, not Mixtral-8x7B)
- Enable
demo.queue()for long tasks - Process max 3 transcripts at a time
- Set
LLM_TIMEOUT=25(under Spaces limit) - Reduce
MAX_TOKENS_PER_REQUEST=100 - Add progress updates to prevent timeout appearance
- Consider upgrading to
cpu-upgradeort4-smallhardware
π― ULTIMATE SPACES FIX
The real issue is Spaces is timing out waiting for a response.
Quick Fix - Add this to the very top of app.py:
import os
import sys
# HuggingFace Spaces Configuration
# MUST be set before any other imports
os.environ["LLM_BACKEND"] = "hf_api"
os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
os.environ["LLM_TIMEOUT"] = "25"
os.environ["MAX_CHUNK_TOKENS"] = "2000"
print("π Running on HuggingFace Spaces")
print(f"π Backend: {os.environ['LLM_BACKEND']}")
print(f"π€ Model: {os.environ['HF_MODEL']}")
print(f"β±οΈ Timeout: {os.environ['LLM_TIMEOUT']}s")
And at the bottom of app.py, change .launch() to:
if __name__ == "__main__":
demo.queue(
concurrency_count=1,
max_size=10,
api_open=False
).launch(
server_name="0.0.0.0",
server_port=7860,
show_error=True
)
π If Still Timing Out
Option 1: Use Spaces Persistent Storage
# Store intermediate results
import pickle
cache_file = "/tmp/transcriptor_cache.pkl"
Option 2: Split Processing
Process in stages:
- Stage 1: Upload & extract text β Save to temp
- Stage 2: Analyze saved text β Return results
Option 3: Use Spaces Secrets for Larger Timeout
Upgrade to cpu-upgrade hardware in Space settings.
The key insight: You're not running locally, so no node.js to crash. The "timeout" is HuggingFace Spaces killing your app for taking too long.
Solution: Use HF API (serverless) instead of loading models in the Space.