Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / FIX_FOR_HF_SPACES.md

jmisak

Upload 57 files

52d0298 verified 2 months ago

preview code

raw

history blame contribute delete

9.18 kB

	# Fix for HuggingFace Spaces Timeout Issues

	## Problem: Spaces Timing Out During Model Loading/Summarization

	HuggingFace Spaces has strict limitations:
	- CPU Basic: 2 vCPU, 16GB RAM, ~60 second timeout
	- CPU Upgraded: 8 vCPU, 32GB RAM, longer timeout
	- GPU: Better but limited availability

	When loading large models or processing many transcripts, Spaces hits these limits.

	---

	## ✅ IMMEDIATE FIXES FOR HF SPACES

	### Fix 1: Use HuggingFace Inference API (Not Local Models)

	The issue is trying to load models ON the Space. Instead, use HF's API endpoints.

	Edit `config.py`:

	```python
	# CRITICAL: Use HF API, not local models
	LLM_BACKEND = "hf_api" # NOT "local"

	# Use serverless inference (no model loading needed)
	HF_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

	# Reduce timeouts for Spaces limits
	LLM_TIMEOUT = 30 # Spaces will kill longer requests
	MAX_TOKENS_PER_REQUEST = 150 # Smaller = faster
	```

	### Fix 2: Set HF Space Secrets

	In your Space settings, add:

	1. Go to: `Settings` → `Repository secrets`
	2. Add secret:
	- Name: `HUGGINGFACE_TOKEN`
	- Value: Your HF token from https://huggingface.co/settings/tokens

	### Fix 3: Reduce Memory Usage

	Edit `app.py` - Process transcripts one at a time:

	```python
	# Instead of processing all at once, batch them
	MAX_TRANSCRIPTS_PER_BATCH = 3 # Process max 3 at a time

	# Split files into batches
	for batch_start in range(0, len(files), MAX_TRANSCRIPTS_PER_BATCH):
	batch_files = files[batch_start:batch_start + MAX_TRANSCRIPTS_PER_BATCH]
	# Process batch...
	```

	### Fix 4: Use Gradio's Queue System

	In `app.py`, at the end:

	```python
	# Enable queue to handle long-running tasks
	demo.queue(
	concurrency_count=1, # Process one at a time
	max_size=10, # Max 10 in queue
	api_open=False
	).launch()
	```

	---

	## 🚀 OPTIMIZED CONFIG FOR HF SPACES

	Create `spaces_config.py`:

	```python
	import os

	# HuggingFace Spaces Optimized Configuration
	os.environ["LLM_BACKEND"] = "hf_api"
	os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
	os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
	os.environ["LLM_TIMEOUT"] = "25"
	os.environ["MAX_CHUNK_TOKENS"] = "2000"
	os.environ["OVERLAP_TOKENS"] = "50"

	# Use serverless inference endpoints
	os.environ["USE_SERVERLESS"] = "true"
	```

	Then import at the top of `app.py`:
	```python
	import spaces_config # Load before other imports
	```

	---

	## 📝 MODIFY FOR SPACES CONSTRAINTS

	### Change 1: Aggressive Chunking

	In `chunking.py`, reduce chunk sizes:

	```python
	# For Spaces, use smaller chunks
	MAX_CHUNK_TOKENS = 2000 # Down from 6000
	OVERLAP_TOKENS = 50 # Down from 150
	```

	### Change 2: Streaming Progress

	In `app.py`, add progress updates to prevent timeout appearance:

	```python
	def analyze(files, ..., progress=gr.Progress()):
	for i, file in enumerate(files):
	# Update progress frequently
	progress((i / len(files)), desc=f"Processing {i+1}/{len(files)}")

	# Yield intermediate results to keep connection alive
	yield f"Processing {file.name}...", None, None, None
	```

	### Change 3: Use @spaces.GPU Decorator (If Available)

	If you have GPU access:

	```python
	import spaces

	@spaces.GPU(duration=60) # Request GPU for 60 seconds
	def analyze_with_gpu(files, ...):
	# Your analysis code
	pass
	```

	---

	## 🎯 RECOMMENDED SPACE CONFIGURATION

	In your Space's `README.md` header:

	```yaml
	---
	title: TranscriptorAI Enhanced
	emoji: 📝
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: mit
	duplicated_from:
	hardware: cpu-upgrade # Or cpu-basic if budget constrained
	---
	```

	Upgrade to CPU Upgrade or GPU for better performance:
	- `hardware: cpu-upgrade` - Better timeout limits
	- `hardware: t4-small` - GPU access (faster)

	---

	## ⚡ LIGHTWEIGHT SPACES VERSION

	Create `app_spaces.py` (lightweight version):

	```python
	import gradio as gr
	import os

	# Force lightweight mode for Spaces
	os.environ["LLM_BACKEND"] = "hf_api"
	os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
	os.environ["LLM_TIMEOUT"] = "20"

	# Import after setting env vars
	from app import analyze, generate_narrative_report_ui

	# Simplified interface for Spaces
	with gr.Blocks() as demo:
	gr.Markdown("# TranscriptorAI - HF Spaces Edition")
	gr.Markdown("⚠️ Note: Process 1-3 transcripts at a time to avoid timeouts")

	with gr.Tab("Analyze Transcripts"):
	with gr.Row():
	files = gr.File(
	label="Upload Transcripts (Max 3 files)",
	file_count="multiple",
	file_types=[".txt", ".docx", ".pdf"]
	)

	with gr.Row():
	file_type = gr.Radio(
	choices=["Auto-detect", "DOCX", "PDF", "TXT"],
	value="Auto-detect",
	label="File Type"
	)
	interviewee_type = gr.Radio(
	choices=["HCP", "Patient", "Other"],
	value="Patient",
	label="Interviewee Type"
	)

	analyze_btn = gr.Button("Analyze (30-60 seconds)", variant="primary")

	output = gr.Textbox(label="Analysis Results", lines=20)
	csv_output = gr.File(label="CSV Report")
	pdf_output = gr.File(label="PDF Report")

	analyze_btn.click(
	fn=analyze,
	inputs=[files, file_type, gr.Textbox(value="", visible=False),
	gr.Textbox(value="", visible=False), gr.Checkbox(value=False, visible=False),
	interviewee_type],
	outputs=[output, csv_output, pdf_output, gr.Plot(visible=False)]
	)

	# Critical for Spaces
	demo.queue(concurrency_count=1).launch(
	server_name="0.0.0.0", # Required for Spaces
	server_port=7860, # Required for Spaces
	share=False
	)
	```

	---

	## 🔧 SPACES-SPECIFIC REQUIREMENTS.TXT

	Create minimal dependencies:

	```txt
	# Lightweight for HF Spaces
	gradio>=4.0.0
	huggingface_hub>=0.19.0
	python-docx>=1.0.0
	pdfplumber>=0.10.0
	pandas>=2.0.0
	reportlab>=4.0.0
	tiktoken>=0.5.0

	# Don't install heavy models locally
	# transformers # REMOVE - use API instead
	# torch # REMOVE - use API instead
	```

	---

	## 📊 DEBUGGING SPACES TIMEOUTS

	### Check Spaces Logs

	In your Space, click `Logs` to see:
	```
	Building Space...
	Loading model... ← If stuck here = model too large
	Timeout after 60s ← Spaces limit hit
	```

	### Add Logging

	```python
	import logging
	logging.basicConfig(level=logging.INFO)
	logger = logging.getLogger(__name__)

	def analyze(...):
	logger.info("Starting analysis...")
	logger.info(f"Processing {len(files)} files")
	# ... rest of code
	```

	---

	## ✅ CHECKLIST FOR SPACES

	- [ ] Set `LLM_BACKEND=hf_api` (not `local`)
	- [ ] Add `HUGGINGFACE_TOKEN` secret in Space settings
	- [ ] Use lightweight model (Mistral-7B, not Mixtral-8x7B)
	- [ ] Enable `demo.queue()` for long tasks
	- [ ] Process max 3 transcripts at a time
	- [ ] Set `LLM_TIMEOUT=25` (under Spaces limit)
	- [ ] Reduce `MAX_TOKENS_PER_REQUEST=100`
	- [ ] Add progress updates to prevent timeout appearance
	- [ ] Consider upgrading to `cpu-upgrade` or `t4-small` hardware

	---

	## 🎯 ULTIMATE SPACES FIX

	The real issue is Spaces is timing out waiting for a response.

	Quick Fix - Add this to the very top of `app.py`:

	```python
	import os
	import sys

	# HuggingFace Spaces Configuration
	# MUST be set before any other imports
	os.environ["LLM_BACKEND"] = "hf_api"
	os.environ["HUGGINGFACE_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN", "")
	os.environ["HF_MODEL"] = "mistralai/Mistral-7B-Instruct-v0.2"
	os.environ["MAX_TOKENS_PER_REQUEST"] = "100"
	os.environ["LLM_TIMEOUT"] = "25"
	os.environ["MAX_CHUNK_TOKENS"] = "2000"

	print("🚀 Running on HuggingFace Spaces")
	print(f"📊 Backend: {os.environ['LLM_BACKEND']}")
	print(f"🤖 Model: {os.environ['HF_MODEL']}")
	print(f"⏱️ Timeout: {os.environ['LLM_TIMEOUT']}s")
	```

	And at the bottom of `app.py`, change `.launch()` to:

	```python
	if __name__ == "__main__":
	demo.queue(
	concurrency_count=1,
	max_size=10,
	api_open=False
	).launch(
	server_name="0.0.0.0",
	server_port=7860,
	show_error=True
	)
	```

	---

	## 📞 If Still Timing Out

	### Option 1: Use Spaces Persistent Storage
	```python
	# Store intermediate results
	import pickle
	cache_file = "/tmp/transcriptor_cache.pkl"
	```

	### Option 2: Split Processing
	Process in stages:
	1. Stage 1: Upload & extract text → Save to temp
	2. Stage 2: Analyze saved text → Return results

	### Option 3: Use Spaces Secrets for Larger Timeout
	Upgrade to `cpu-upgrade` hardware in Space settings.

	---

	The key insight: You're not running locally, so no node.js to crash.
	The "timeout" is HuggingFace Spaces killing your app for taking too long.

	Solution: Use HF API (serverless) instead of loading models in the Space.