Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / TROUBLESHOOTING_LLM_TIMEOUT.md

jmisak

Upload 57 files

52d0298 verified 2 months ago

preview code

raw

history blame contribute delete

10.5 kB

	# Troubleshooting: LLM Timeout & Node.js Server Crashes

	## Problem: App Hangs During Summarization / Node.js Server Stops

	### Symptoms
	- ✗ Application stops responding during "summarizing" phase
	- ✗ Node.js server process terminates
	- ✗ No error message, just hangs indefinitely
	- ✗ Model loading takes forever or never completes

	---

	## ✅ IMMEDIATE FIX (Already Applied)

	The enhanced version now includes:

	1. Aggressive Timeout Protection (`llm_robust.py`)
	- Hard 60-second timeout (down from 120s)
	- Automatic fallback to lightweight processing
	- Emergency text-based analysis if LLM fails

	2. Optimized Configuration (`.env` file created)
	- Lighter model recommendation (Mistral-7B vs Mixtral-8x7B)
	- Reduced token requirements (200 vs 300)
	- Faster failure detection

	3. Startup Health Check (`start.sh` script)
	- Tests LLM connectivity before processing
	- Warns about configuration issues
	- Prevents hanging before it starts

	---

	## 🚀 Quick Start (Using Fixed Version)

	### Option 1: Use Startup Script (Recommended)
	```bash
	cd /home/john/TranscriptorEnhanced

	# Edit .env and add your HuggingFace token
	nano .env

	# Start with health check
	./start.sh
	```

	### Option 2: Manual Start with Health Check
	```bash
	cd /home/john/TranscriptorEnhanced

	# Test connectivity first
	python3 fix_llm_timeout.py --test

	# If test passes, start app
	source .env
	python3 app.py
	```

	---

	## 🔧 Configuration Options

	### .env File (Already Created)

	```bash
	# Option A: Use HuggingFace API (Most Stable - RECOMMENDED)
	LLM_BACKEND=hf_api
	HUGGINGFACE_TOKEN=your_token_here # ← ADD YOUR TOKEN HERE
	HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Lighter model

	# Option B: Use LMStudio (Local - if you have it running)
	LLM_BACKEND=lmstudio
	LM_STUDIO_URL=http://localhost:1234

	# Timeout Settings (Prevents Hanging)
	LLM_TIMEOUT=60 # Hard timeout at 60 seconds
	MAX_TOKENS_PER_REQUEST=200 # Reduced for speed
	```

	---

	## 📋 Diagnostics

	### Run Full Diagnostic
	```bash
	cd /home/john/TranscriptorEnhanced
	python3 fix_llm_timeout.py --diagnose
	```

	### Test LLM Connectivity
	```bash
	python3 fix_llm_timeout.py --test
	```

	### Check Current Configuration
	```bash
	python3 fix_llm_timeout.py --config
	```

	---

	## 🔍 Root Cause Analysis

	### Why It Hangs

	1. Large Model + Limited Memory
	- Mixtral-8x7B requires ~30GB RAM
	- Loading model exhausts memory
	- Node.js/Python process killed by OS

	2. Network Timeouts
	- HuggingFace API unreachable
	- Slow network connection
	- Rate limiting

	3. Server Overload
	- Multiple concurrent requests
	- LMStudio running out of resources
	- GPU memory exhaustion

	---

	## ✅ Solutions Applied

	### 1. Timeout Protection (`llm_robust.py`)

	Before:
	```python
	# Waits indefinitely if model hangs
	summary = query_llm(prompt, ...)
	```

	After:
	```python
	# Times out after 60s, uses fallback
	with timeout(60):
	summary = query_llm(prompt, ...)
	# Falls back to lightweight text extraction if timeout
	```

	### 2. Lightweight Fallbacks

	If LLM times out, the system now:
	1. Extracts data from the prompt text itself
	2. Generates a lightweight summary with preserved data
	3. Continues processing instead of crashing
	4. Creates a report noting the limitation

	Example Fallback Output:
	```
	LIGHTWEIGHT SUMMARY REPORT
	(Generated due to LLM timeout - data extracted from available information)

	SAMPLE OVERVIEW:
	Total Patient interviews analyzed: 12

	KEY OBSERVATIONS:
	This analysis is based on structured data extraction rather than full LLM synthesis.

	DATA EXTRACTED:
	- Structured data preserved in CSV
	- Individual transcript analyses completed
	- Quantitative data available

	RECOMMENDATIONS:
	1. Reduce batch size (process fewer transcripts at once)
	2. Verify LLM server connectivity
	3. Consider lighter model (Mistral-7B vs Mixtral-8x7B)
	```

	### 3. Progressive Timeout Strategy

	```
	┌──────────────────────────────────────┐
	│ Attempt 1: Full LLM (60s timeout) │
	└──────────┬───────────────────────────┘
	│
	├─ Success → Continue normally
	│
	└─ Timeout → Fallback
	↓
	┌──────────────────────────────────────┐
	│ Attempt 2: Lightweight extraction │
	│ (Pattern-based, no LLM) │
	└──────────┬───────────────────────────┘
	│
	├─ Success → Continue with warning
	│
	└─ Failure → Emergency fallback
	↓
	┌──────────────────────────────────────┐
	│ Emergency: Preserve data only │
	│ (CSV export, minimal summary) │
	└──────────────────────────────────────┘
	```

	---

	## 🎯 Recommended Settings by Use Case

	### Small Datasets (1-5 transcripts)
	```bash
	LLM_BACKEND=hf_api
	HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
	LLM_TIMEOUT=90
	MAX_TOKENS_PER_REQUEST=300
	```

	### Medium Datasets (6-15 transcripts)
	```bash
	LLM_BACKEND=hf_api
	HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
	LLM_TIMEOUT=60
	MAX_TOKENS_PER_REQUEST=200
	```

	### Large Datasets (15+ transcripts) - Process in Batches
	```bash
	LLM_BACKEND=hf_api
	HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
	LLM_TIMEOUT=45
	MAX_TOKENS_PER_REQUEST=150

	# Process in batches of 10 transcripts max
	```

	---

	## 🛠️ Manual Fixes

	### If HuggingFace API is slow/timing out

	1. Get a HuggingFace Token
	```bash
	# Visit: https://huggingface.co/settings/tokens
	# Create a token
	# Add to .env:
	HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
	```

	2. Use Lighter Model
	```bash
	# Edit .env:
	HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Instead of Mixtral-8x7B
	```

	3. Reduce Request Size
	```bash
	# Edit .env:
	MAX_TOKENS_PER_REQUEST=150
	MAX_CHUNK_TOKENS=3000
	```

	### If Using LMStudio

	1. Start LMStudio Server
	```bash
	# Open LMStudio
	# Go to Server tab
	# Start server on http://localhost:1234
	```

	2. Load a Lightweight Model
	```bash
	# In LMStudio, load one of:
	- Mistral 7B Instruct
	- Llama 2 7B Chat
	- Phi-2

	# Avoid heavy models:
	- ✗ Mixtral 8x7B (too large)
	- ✗ Llama 70B (too large)
	```

	3. Configure .env
	```bash
	LLM_BACKEND=lmstudio
	LM_STUDIO_URL=http://localhost:1234
	```

	---

	## 📊 Monitoring During Execution

	The enhanced version now prints progress:

	```
	[Summary] Generating cross-transcript summary...
	[Summary] Note: This may take 30-60 seconds for large datasets
	[LLM] Starting summary generation...
	[LLM] Timeout limit: 60s
	[LLM] ✓ Completed successfully
	[Summary] ✓ Validation passed (score: 0.85)
	```

	Watch for these messages:
	- ✓ `Completed successfully` - All good
	- ⚠ `Timeout after 60s` - Fallback activated
	- ✗ `Using emergency fallback` - LLM completely unavailable

	---

	## 🔄 What Happens Now vs Before

	### BEFORE (Hanging Behavior)

	```
	Processing transcripts... ✓
	Extracting data... ✓
	Generating summary...
	[Waits indefinitely]
	[Node.js crashes]
	[No output]
	```

	### AFTER (Graceful Degradation)

	```
	Processing transcripts... ✓
	Extracting data... ✓
	Generating summary...
	[LLM] Starting summary generation...
	[LLM] Timeout limit: 60s
	[LLM] ✗ Timeout after 60s
	[LLM] Generating lightweight fallback...
	[Summary] Using fallback summary
	✓ Report generated with preserved data
	```

	---

	## 📝 Testing the Fix

	### Test 1: Verify Timeout Works
	```bash
	cd /home/john/TranscriptorEnhanced

	# This should complete in <60s or fallback gracefully
	python3 -c "
	from llm_robust import query_llm_with_timeout
	result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10)
	print('Success!' if result else 'Failed')
	"
	```

	### Test 2: Full End-to-End
	```bash
	# Process a small transcript to verify
	./start.sh
	# Upload 1 transcript through UI
	# Should complete in <2 minutes total
	```

	---

	## 🚨 If Still Having Issues

	### 1. Completely Bypass LLM (Emergency Mode)

	Edit `/home/john/TranscriptorEnhanced/.env`:
	```bash
	# Force all LLM calls to use lightweight fallback
	LLM_TIMEOUT=1 # 1 second timeout forces immediate fallback
	```

	This will:
	- Skip LLM processing entirely
	- Use pattern-based extraction only
	- Generate reports from structured data
	- Complete in seconds instead of minutes

	### 2. Process One Transcript at a Time
	Instead of batch processing, process individually through the UI.

	### 3. Check System Resources
	```bash
	# Check available memory
	free -h

	# Check running processes
	ps aux \| grep -i "python\\|node\\|lmstudio"

	# Kill stuck processes
	pkill -f "python app.py"
	pkill -f lmstudio
	```

	---

	## ✅ Summary of Fixes

	\| Issue \| Fix Applied \| File \|
	\|-------\|-------------\|------\|
	\| Indefinite hangs \| 60s hard timeout \| `llm_robust.py` \|
	\| No fallback \| Lightweight text extraction \| `llm_robust.py` \|
	\| Server crashes \| Graceful degradation \| `app.py` \|
	\| Heavy models \| Lighter model recommendation \| `.env` \|
	\| No health check \| Startup connectivity test \| `fix_llm_timeout.py`, `start.sh` \|

	---

	## 📞 Support

	If issues persist:

	1. Check logs: Console output shows exactly where it's failing
	2. Run diagnostic: `python3 fix_llm_timeout.py --diagnose`
	3. Try emergency mode: Set `LLM_TIMEOUT=1` in `.env`
	4. Process smaller batches: 1-5 transcripts at a time

	The system will now always complete, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability.

	---

	Status: ✅ Fixes Applied and Ready to Test
	Next Step: Run `./start.sh` to start with health check