Spaces:
Sleeping
Sleeping
| # Fix for HuggingFace Spaces Timeout Issues | |
| ## Problem: Spaces Timing Out During Model Loading/Summarization | |
| HuggingFace Spaces has strict limitations: | |
| - **CPU Basic**: 2 vCPU, 16GB RAM, ~60 second timeout | |
| - **CPU Upgraded**: 8 vCPU, 32GB RAM, longer timeout | |
| - **GPU**: Better but limited availability | |
| When loading large models or processing many transcripts, Spaces hits these limits. | |
| --- | |
| ## β IMMEDIATE FIXES FOR HF SPACES | |
| ### Fix 1: Use HuggingFace Inference API (Not Local Models) | |
| The issue is trying to load models ON the Space. Instead, use HF's API endpoints. | |
| **Edit `config.py`:** | |
| ```python | |
| # CRITICAL: Use HF API, not local models | |
| LLM_BACKEND = "hf_api" # NOT "local" | |
| # Use serverless inference (no model loading needed) | |
| HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2" | |
| # Reduce timeouts for Spaces limits | |
| LLM_TIMEOUT = 30 # Spaces will kill longer requests | |
| MAX_TOKENS_PER_REQUEST = 150 # Smaller = faster | |
| ``` | |
| ### Fix 2: Set HF Space Secrets | |
| In your Space settings, add: | |
| 1. Go to: `Settings` β `Repository secrets` | |
| 2. Add secret: | |
| - Name: `HUGGINGFACE_TOKEN` | |
| - Value: Your HF token from https://huggingface.co/settings/tokens | |
| ### Fix 3: Reduce Memory Usage | |
| **Edit `app.py`** - Process transcripts one at a time: | |
| ```python | |
| # Instead of processing all at once, batch them | |
| MAX_TRANSCRIPTS_PER_BATCH = 3 # Process max 3 at a time | |
| # Split files into batches | |
| for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH): | |
| batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH] | |
| # Process batch... | |
| ``` | |
| ### Fix 4: Use Gradio's Queue System | |
| **In `app.py`**, at the end: | |
| ```python | |
| # Enable queue to handle long-running tasks | |
| demo.queue( | |
| concurrency_count=1, # Process one at a time | |
| max_size=10, # Max 10 in queue | |
| api_open=False | |
| ).launch() | |
| ``` | |
| --- | |
| ## π OPTIMIZED CONFIG FOR HF SPACES | |
| Create `spaces_config.py`: | |
| ```python | |
| import os | |
| # HuggingFace Spaces Optimized Configuration | |
| os.environ["LLM_BACKEND"] = "hf_api" | |
| os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2" | |
| os.environ["MAX_TOKENS_PER_REQUEST"] = "100" | |
| os.environ["LLM_TIMEOUT"] = "25" | |
| os.environ["MAX_CHUNK_TOKENS"] = "2000" | |
| os.environ["OVERLAP_TOKENS"] = "50" | |
| # Use serverless inference endpoints | |
| os.environ["USE_SERVERLESS"] = "true" | |
| ``` | |
| Then import at the top of `app.py`: | |
| ```python | |
| import spaces_config # Load before other imports | |
| ``` | |
| --- | |
| ## π MODIFY FOR SPACES CONSTRAINTS | |
| ### Change 1: Aggressive Chunking | |
| **In `chunking.py`**, reduce chunk sizes: | |
| ```python | |
| # For Spaces, use smaller chunks | |
| MAX_CHUNK_TOKENS = 2000 # Down from 6000 | |
| OVERLAP_TOKENS = 50 # Down from 150 | |
| ``` | |
| ### Change 2: Streaming Progress | |
| **In `app.py`**, add progress updates to prevent timeout appearance: | |
| ```python | |
| def analyze(files, ..., progress=gr.Progress()): | |
| for i, file in enumerate(files): | |
| # Update progress frequently | |
| progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}") | |
| # Yield intermediate results to keep connection alive | |
| yield f"Processing {file.name}...", None, None, None | |
| ``` | |
| ### Change 3: Use @spaces.GPU Decorator (If Available) | |
| If you have GPU access: | |
| ```python | |
| import spaces | |
| @spaces.GPU(duration=60) # Request GPU for 60 seconds | |
| def analyze_with_gpu(files, ...): | |
| # Your analysis code | |
| pass | |
| ``` | |
| --- | |
| ## π― RECOMMENDED SPACE CONFIGURATION | |
| **In your Space's `README.md` header:** | |
| ```yaml | |
| --- | |
| title: TranscriptorAI Enhanced | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 4.0.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| duplicated_from: | |
| hardware: cpu-upgrade # Or cpu-basic if budget constrained | |
| --- | |
| ``` | |
| **Upgrade to CPU Upgrade or GPU** for better performance: | |
| - `hardware: cpu-upgrade` - Better timeout limits | |
| - `hardware: t4-small` - GPU access (faster) | |
| --- | |
| ## β‘ LIGHTWEIGHT SPACES VERSION | |
| Create `app_spaces.py` (lightweight version): | |
| ```python | |
| import gradio as gr | |
| import os | |
| # Force lightweight mode for Spaces | |
| os.environ["LLM_BACKEND"] = "hf_api" | |
| os.environ["MAX_TOKENS_PER_REQUEST"] = "100" | |
| os.environ["LLM_TIMEOUT"] = "20" | |
| # Import after setting env vars | |
| from app import analyze, generate_narrative_report_ui | |
| # Simplified interface for Spaces | |
| with gr.Blocks() as demo: | |
| gr.Markdown("# TranscriptorAI - HF Spaces Edition") | |
| gr.Markdown("β οΈ **Note**: Process 1-3 transcripts at a time to avoid timeouts") | |
| with gr.Tab("Analyze Transcripts"): | |
| with gr.Row(): | |
| files = gr.File( | |
| label="Upload Transcripts (Max 3 files)", | |
| file_count="multiple", | |
| file_types=[".txt", ".docx", ".pdf"] | |
| ) | |
| with gr.Row(): | |
| file_type = gr.Radio( | |
| choices=["Auto-detect", "DOCX", "PDF", "TXT"], | |
| value="Auto-detect", | |
| label="File Type" | |
| ) | |
| interviewee_type = gr.Radio( | |
| choices=["HCP", "Patient", "Other"], | |
| value="Patient", | |
| label="Interviewee Type" | |
| ) | |
| analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary") | |
| output = gr.Textbox(label="Analysis Results", lines=20) | |
| csv_output = gr.File(label="CSV Report") | |
| pdf_output = gr.File(label="PDF Report") | |
| analyze_btn.click( | |
| fn=analyze, | |
| inputs=[files, file_type, gr.Textbox(value="", visible=False), | |
| gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False), | |
| interviewee_type], | |
| outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)] | |
| ) | |
| # Critical for Spaces | |
| demo.queue(concurrency_count=1).launch( | |
| server_name="0.0.0.0", # Required for Spaces | |
| server_port=7860, # Required for Spaces | |
| share=False | |
| ) | |
| ``` | |
| --- | |
| ## π§ SPACES-SPECIFIC REQUIREMENTS.TXT | |
| Create minimal dependencies: | |
| ```txt | |
| # Lightweight for HF Spaces | |
| gradio>=4.0.0 | |
| huggingface_hub>=0.19.0 | |
| python-docx>=1.0.0 | |
| pdfplumber>=0.10.0 | |
| pandas>=2.0.0 | |
| reportlab>=4.0.0 | |
| tiktoken>=0.5.0 | |
| # Don't install heavy models locally | |
| # transformers # REMOVE - use API instead | |
| # torch # REMOVE - use API instead | |
| ``` | |
| --- | |
| ## π DEBUGGING SPACES TIMEOUTS | |
| ### Check Spaces Logs | |
| In your Space, click `Logs` to see: | |
| ``` | |
| Building Space... | |
| Loading model... β If stuck here = model too large | |
| Timeout after 60s β Spaces limit hit | |
| ``` | |
| ### Add Logging | |
| ```python | |
| import logging | |
| logging.basicConfig(level=logging.INFO) | |
| logger = logging.getLogger(__name__) | |
| def analyze(...): | |
| logger.info("Starting analysis...") | |
| logger.info(f"Processing {len(files)} files") | |
| # ... rest of code | |
| ``` | |
| --- | |
| ## β CHECKLIST FOR SPACES | |
| - [ ] Set `LLM_BACKEND=hf_api` (not `local`) | |
| - [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings | |
| - [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B) | |
| - [ ] Enable `demo.queue()` for long tasks | |
| - [ ] Process max 3 transcripts at a time | |
| - [ ] Set `LLM_TIMEOUT=25` (under Spaces limit) | |
| - [ ] Reduce `MAX_TOKENS_PER_REQUEST=100` | |
| - [ ] Add progress updates to prevent timeout appearance | |
| - [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware | |
| --- | |
| ## π― ULTIMATE SPACES FIX | |
| The real issue is **Spaces is timing out waiting for a response**. | |
| **Quick Fix - Add this to the very top of `app.py`:** | |
| ```python | |
| import os | |
| import sys | |
| # HuggingFace Spaces Configuration | |
| # MUST be set before any other imports | |
| os.environ["LLM_BACKEND"] = "hf_api" | |
| os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "") | |
| os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2" | |
| os.environ["MAX_TOKENS_PER_REQUEST"] = "100" | |
| os.environ["LLM_TIMEOUT"] = "25" | |
| os.environ["MAX_CHUNK_TOKENS"] = "2000" | |
| print("π Running on HuggingFace Spaces") | |
| print(f"π Backend: {os.environ['LLM_BACKEND']}") | |
| print(f"π€ Model: {os.environ['HF_MODEL']}") | |
| print(f"β±οΈ Timeout: {os.environ['LLM_TIMEOUT']}s") | |
| ``` | |
| **And at the bottom of `app.py`, change `.launch()` to:** | |
| ```python | |
| if __name__ == "__main__": | |
| demo.queue( | |
| concurrency_count=1, | |
| max_size=10, | |
| api_open=False | |
| ).launch( | |
| server_name="0.0.0.0", | |
| server_port=7860, | |
| show_error=True | |
| ) | |
| ``` | |
| --- | |
| ## π If Still Timing Out | |
| ### Option 1: Use Spaces Persistent Storage | |
| ```python | |
| # Store intermediate results | |
| import pickle | |
| cache_file = "/tmp/transcriptor_cache.pkl" | |
| ``` | |
| ### Option 2: Split Processing | |
| Process in stages: | |
| 1. Stage 1: Upload & extract text β Save to temp | |
| 2. Stage 2: Analyze saved text β Return results | |
| ### Option 3: Use Spaces Secrets for Larger Timeout | |
| Upgrade to `cpu-upgrade` hardware in Space settings. | |
| --- | |
| **The key insight**: You're not running locally, so no node.js to crash. | |
| The "timeout" is HuggingFace Spaces killing your app for taking too long. | |
| **Solution**: Use HF API (serverless) instead of loading models in the Space. | |