# Troubleshooting: LLM Timeout & Node.js Server Crashes ## Problem: App Hangs During Summarization / Node.js Server Stops ### Symptoms - ✗ Application stops responding during "summarizing" phase - ✗ Node.js server process terminates - ✗ No error message, just hangs indefinitely - ✗ Model loading takes forever or never completes --- ## ✅ IMMEDIATE FIX (Already Applied) The enhanced version now includes: 1. **Aggressive Timeout Protection** (`llm_robust.py`) - Hard 60-second timeout (down from 120s) - Automatic fallback to lightweight processing - Emergency text-based analysis if LLM fails 2. **Optimized Configuration** (`.env` file created) - Lighter model recommendation (Mistral-7B vs Mixtral-8x7B) - Reduced token requirements (200 vs 300) - Faster failure detection 3. **Startup Health Check** (`start.sh` script) - Tests LLM connectivity before processing - Warns about configuration issues - Prevents hanging before it starts --- ## 🚀 Quick Start (Using Fixed Version) ### Option 1: Use Startup Script (Recommended) ```bash cd /home/john/TranscriptorEnhanced # Edit .env and add your HuggingFace token nano .env # Start with health check ./start.sh ``` ### Option 2: Manual Start with Health Check ```bash cd /home/john/TranscriptorEnhanced # Test connectivity first python3 fix_llm_timeout.py --test # If test passes, start app source .env python3 app.py ``` --- ## 🔧 Configuration Options ### .env File (Already Created) ```bash # Option A: Use HuggingFace API (Most Stable - RECOMMENDED) LLM_BACKEND=hf_api HUGGINGFACE_TOKEN=your_token_here # ← ADD YOUR TOKEN HERE HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Lighter model # Option B: Use LMStudio (Local - if you have it running) LLM_BACKEND=lmstudio LM_STUDIO_URL=http://localhost:1234 # Timeout Settings (Prevents Hanging) LLM_TIMEOUT=60 # Hard timeout at 60 seconds MAX_TOKENS_PER_REQUEST=200 # Reduced for speed ``` --- ## 📋 Diagnostics ### Run Full Diagnostic ```bash cd /home/john/TranscriptorEnhanced python3 fix_llm_timeout.py --diagnose ``` ### Test LLM Connectivity ```bash python3 fix_llm_timeout.py --test ``` ### Check Current Configuration ```bash python3 fix_llm_timeout.py --config ``` --- ## 🔍 Root Cause Analysis ### Why It Hangs **1. Large Model + Limited Memory** - Mixtral-8x7B requires ~30GB RAM - Loading model exhausts memory - Node.js/Python process killed by OS **2. Network Timeouts** - HuggingFace API unreachable - Slow network connection - Rate limiting **3. Server Overload** - Multiple concurrent requests - LMStudio running out of resources - GPU memory exhaustion --- ## ✅ Solutions Applied ### 1. Timeout Protection (`llm_robust.py`) **Before:** ```python # Waits indefinitely if model hangs summary = query_llm(prompt, ...) ``` **After:** ```python # Times out after 60s, uses fallback with timeout(60): summary = query_llm(prompt, ...) # Falls back to lightweight text extraction if timeout ``` ### 2. Lightweight Fallbacks If LLM times out, the system now: 1. Extracts data from the prompt text itself 2. Generates a lightweight summary with preserved data 3. Continues processing instead of crashing 4. Creates a report noting the limitation **Example Fallback Output:** ``` LIGHTWEIGHT SUMMARY REPORT (Generated due to LLM timeout - data extracted from available information) SAMPLE OVERVIEW: Total Patient interviews analyzed: 12 KEY OBSERVATIONS: This analysis is based on structured data extraction rather than full LLM synthesis. DATA EXTRACTED: - Structured data preserved in CSV - Individual transcript analyses completed - Quantitative data available RECOMMENDATIONS: 1. Reduce batch size (process fewer transcripts at once) 2. Verify LLM server connectivity 3. Consider lighter model (Mistral-7B vs Mixtral-8x7B) ``` ### 3. Progressive Timeout Strategy ``` ┌──────────────────────────────────────┐ │ Attempt 1: Full LLM (60s timeout) │ └──────────┬───────────────────────────┘ │ ├─ Success → Continue normally │ └─ Timeout → Fallback ↓ ┌──────────────────────────────────────┐ │ Attempt 2: Lightweight extraction │ │ (Pattern-based, no LLM) │ └──────────┬───────────────────────────┘ │ ├─ Success → Continue with warning │ └─ Failure → Emergency fallback ↓ ┌──────────────────────────────────────┐ │ Emergency: Preserve data only │ │ (CSV export, minimal summary) │ └──────────────────────────────────────┘ ``` --- ## 🎯 Recommended Settings by Use Case ### Small Datasets (1-5 transcripts) ```bash LLM_BACKEND=hf_api HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 LLM_TIMEOUT=90 MAX_TOKENS_PER_REQUEST=300 ``` ### Medium Datasets (6-15 transcripts) ```bash LLM_BACKEND=hf_api HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 LLM_TIMEOUT=60 MAX_TOKENS_PER_REQUEST=200 ``` ### Large Datasets (15+ transcripts) - Process in Batches ```bash LLM_BACKEND=hf_api HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 LLM_TIMEOUT=45 MAX_TOKENS_PER_REQUEST=150 # Process in batches of 10 transcripts max ``` --- ## 🛠️ Manual Fixes ### If HuggingFace API is slow/timing out **1. Get a HuggingFace Token** ```bash # Visit: https://huggingface.co/settings/tokens # Create a token # Add to .env: HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx ``` **2. Use Lighter Model** ```bash # Edit .env: HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Instead of Mixtral-8x7B ``` **3. Reduce Request Size** ```bash # Edit .env: MAX_TOKENS_PER_REQUEST=150 MAX_CHUNK_TOKENS=3000 ``` ### If Using LMStudio **1. Start LMStudio Server** ```bash # Open LMStudio # Go to Server tab # Start server on http://localhost:1234 ``` **2. Load a Lightweight Model** ```bash # In LMStudio, load one of: - Mistral 7B Instruct - Llama 2 7B Chat - Phi-2 # Avoid heavy models: - ✗ Mixtral 8x7B (too large) - ✗ Llama 70B (too large) ``` **3. Configure .env** ```bash LLM_BACKEND=lmstudio LM_STUDIO_URL=http://localhost:1234 ``` --- ## 📊 Monitoring During Execution The enhanced version now prints progress: ``` [Summary] Generating cross-transcript summary... [Summary] Note: This may take 30-60 seconds for large datasets [LLM] Starting summary generation... [LLM] Timeout limit: 60s [LLM] ✓ Completed successfully [Summary] ✓ Validation passed (score: 0.85) ``` Watch for these messages: - ✓ `Completed successfully` - All good - ⚠ `Timeout after 60s` - Fallback activated - ✗ `Using emergency fallback` - LLM completely unavailable --- ## 🔄 What Happens Now vs Before ### BEFORE (Hanging Behavior) ``` Processing transcripts... ✓ Extracting data... ✓ Generating summary... [Waits indefinitely] [Node.js crashes] [No output] ``` ### AFTER (Graceful Degradation) ``` Processing transcripts... ✓ Extracting data... ✓ Generating summary... [LLM] Starting summary generation... [LLM] Timeout limit: 60s [LLM] ✗ Timeout after 60s [LLM] Generating lightweight fallback... [Summary] Using fallback summary ✓ Report generated with preserved data ``` --- ## 📝 Testing the Fix ### Test 1: Verify Timeout Works ```bash cd /home/john/TranscriptorEnhanced # This should complete in <60s or fallback gracefully python3 -c " from llm_robust import query_llm_with_timeout result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10) print('Success!' if result else 'Failed') " ``` ### Test 2: Full End-to-End ```bash # Process a small transcript to verify ./start.sh # Upload 1 transcript through UI # Should complete in <2 minutes total ``` --- ## 🚨 If Still Having Issues ### 1. Completely Bypass LLM (Emergency Mode) Edit `/home/john/TranscriptorEnhanced/.env`: ```bash # Force all LLM calls to use lightweight fallback LLM_TIMEOUT=1 # 1 second timeout forces immediate fallback ``` This will: - Skip LLM processing entirely - Use pattern-based extraction only - Generate reports from structured data - Complete in seconds instead of minutes ### 2. Process One Transcript at a Time Instead of batch processing, process individually through the UI. ### 3. Check System Resources ```bash # Check available memory free -h # Check running processes ps aux | grep -i "python\|node\|lmstudio" # Kill stuck processes pkill -f "python app.py" pkill -f lmstudio ``` --- ## ✅ Summary of Fixes | Issue | Fix Applied | File | |-------|-------------|------| | Indefinite hangs | 60s hard timeout | `llm_robust.py` | | No fallback | Lightweight text extraction | `llm_robust.py` | | Server crashes | Graceful degradation | `app.py` | | Heavy models | Lighter model recommendation | `.env` | | No health check | Startup connectivity test | `fix_llm_timeout.py`, `start.sh` | --- ## 📞 Support If issues persist: 1. **Check logs**: Console output shows exactly where it's failing 2. **Run diagnostic**: `python3 fix_llm_timeout.py --diagnose` 3. **Try emergency mode**: Set `LLM_TIMEOUT=1` in `.env` 4. **Process smaller batches**: 1-5 transcripts at a time **The system will now always complete**, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability. --- **Status:** ✅ Fixes Applied and Ready to Test **Next Step:** Run `./start.sh` to start with health check