Spaces:
Sleeping
Sleeping
| # Troubleshooting: LLM Timeout & Node.js Server Crashes | |
| ## Problem: App Hangs During Summarization / Node.js Server Stops | |
| ### Symptoms | |
| - β Application stops responding during "summarizing" phase | |
| - β Node.js server process terminates | |
| - β No error message, just hangs indefinitely | |
| - β Model loading takes forever or never completes | |
| --- | |
| ## β IMMEDIATE FIX (Already Applied) | |
| The enhanced version now includes: | |
| 1. **Aggressive Timeout Protection** (`llm_robust.py`) | |
| - Hard 60-second timeout (down from 120s) | |
| - Automatic fallback to lightweight processing | |
| - Emergency text-based analysis if LLM fails | |
| 2. **Optimized Configuration** (`.env` file created) | |
| - Lighter model recommendation (Mistral-7B vs Mixtral-8x7B) | |
| - Reduced token requirements (200 vs 300) | |
| - Faster failure detection | |
| 3. **Startup Health Check** (`start.sh` script) | |
| - Tests LLM connectivity before processing | |
| - Warns about configuration issues | |
| - Prevents hanging before it starts | |
| --- | |
| ## π Quick Start (Using Fixed Version) | |
| ### Option 1: Use Startup Script (Recommended) | |
| ```bash | |
| cd /home/john/TranscriptorEnhanced | |
| # Edit .env and add your HuggingFace token | |
| nano .env | |
| # Start with health check | |
| ./start.sh | |
| ``` | |
| ### Option 2: Manual Start with Health Check | |
| ```bash | |
| cd /home/john/TranscriptorEnhanced | |
| # Test connectivity first | |
| python3 fix_llm_timeout.py --test | |
| # If test passes, start app | |
| source .env | |
| python3 app.py | |
| ``` | |
| --- | |
| ## π§ Configuration Options | |
| ### .env File (Already Created) | |
| ```bash | |
| # Option A: Use HuggingFace API (Most Stable - RECOMMENDED) | |
| LLM_BACKEND=hf_api | |
| HUGGINGFACE_TOKEN=your_token_here # β ADD YOUR TOKEN HERE | |
| HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Lighter model | |
| # Option B: Use LMStudio (Local - if you have it running) | |
| LLM_BACKEND=lmstudio | |
| LM_STUDIO_URL=http://localhost:1234 | |
| # Timeout Settings (Prevents Hanging) | |
| LLM_TIMEOUT=60 # Hard timeout at 60 seconds | |
| MAX_TOKENS_PER_REQUEST=200 # Reduced for speed | |
| ``` | |
| --- | |
| ## π Diagnostics | |
| ### Run Full Diagnostic | |
| ```bash | |
| cd /home/john/TranscriptorEnhanced | |
| python3 fix_llm_timeout.py --diagnose | |
| ``` | |
| ### Test LLM Connectivity | |
| ```bash | |
| python3 fix_llm_timeout.py --test | |
| ``` | |
| ### Check Current Configuration | |
| ```bash | |
| python3 fix_llm_timeout.py --config | |
| ``` | |
| --- | |
| ## π Root Cause Analysis | |
| ### Why It Hangs | |
| **1. Large Model + Limited Memory** | |
| - Mixtral-8x7B requires ~30GB RAM | |
| - Loading model exhausts memory | |
| - Node.js/Python process killed by OS | |
| **2. Network Timeouts** | |
| - HuggingFace API unreachable | |
| - Slow network connection | |
| - Rate limiting | |
| **3. Server Overload** | |
| - Multiple concurrent requests | |
| - LMStudio running out of resources | |
| - GPU memory exhaustion | |
| --- | |
| ## β Solutions Applied | |
| ### 1. Timeout Protection (`llm_robust.py`) | |
| **Before:** | |
| ```python | |
| # Waits indefinitely if model hangs | |
| summary = query_llm(prompt, ...) | |
| ``` | |
| **After:** | |
| ```python | |
| # Times out after 60s, uses fallback | |
| with timeout(60): | |
| summary = query_llm(prompt, ...) | |
| # Falls back to lightweight text extraction if timeout | |
| ``` | |
| ### 2. Lightweight Fallbacks | |
| If LLM times out, the system now: | |
| 1. Extracts data from the prompt text itself | |
| 2. Generates a lightweight summary with preserved data | |
| 3. Continues processing instead of crashing | |
| 4. Creates a report noting the limitation | |
| **Example Fallback Output:** | |
| ``` | |
| LIGHTWEIGHT SUMMARY REPORT | |
| (Generated due to LLM timeout - data extracted from available information) | |
| SAMPLE OVERVIEW: | |
| Total Patient interviews analyzed: 12 | |
| KEY OBSERVATIONS: | |
| This analysis is based on structured data extraction rather than full LLM synthesis. | |
| DATA EXTRACTED: | |
| - Structured data preserved in CSV | |
| - Individual transcript analyses completed | |
| - Quantitative data available | |
| RECOMMENDATIONS: | |
| 1. Reduce batch size (process fewer transcripts at once) | |
| 2. Verify LLM server connectivity | |
| 3. Consider lighter model (Mistral-7B vs Mixtral-8x7B) | |
| ``` | |
| ### 3. Progressive Timeout Strategy | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Attempt 1: Full LLM (60s timeout) β | |
| ββββββββββββ¬ββββββββββββββββββββββββββββ | |
| β | |
| ββ Success β Continue normally | |
| β | |
| ββ Timeout β Fallback | |
| β | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Attempt 2: Lightweight extraction β | |
| β (Pattern-based, no LLM) β | |
| ββββββββββββ¬ββββββββββββββββββββββββββββ | |
| β | |
| ββ Success β Continue with warning | |
| β | |
| ββ Failure β Emergency fallback | |
| β | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Emergency: Preserve data only β | |
| β (CSV export, minimal summary) β | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## π― Recommended Settings by Use Case | |
| ### Small Datasets (1-5 transcripts) | |
| ```bash | |
| LLM_BACKEND=hf_api | |
| HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 | |
| LLM_TIMEOUT=90 | |
| MAX_TOKENS_PER_REQUEST=300 | |
| ``` | |
| ### Medium Datasets (6-15 transcripts) | |
| ```bash | |
| LLM_BACKEND=hf_api | |
| HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 | |
| LLM_TIMEOUT=60 | |
| MAX_TOKENS_PER_REQUEST=200 | |
| ``` | |
| ### Large Datasets (15+ transcripts) - Process in Batches | |
| ```bash | |
| LLM_BACKEND=hf_api | |
| HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 | |
| LLM_TIMEOUT=45 | |
| MAX_TOKENS_PER_REQUEST=150 | |
| # Process in batches of 10 transcripts max | |
| ``` | |
| --- | |
| ## π οΈ Manual Fixes | |
| ### If HuggingFace API is slow/timing out | |
| **1. Get a HuggingFace Token** | |
| ```bash | |
| # Visit: https://huggingface.co/settings/tokens | |
| # Create a token | |
| # Add to .env: | |
| HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx | |
| ``` | |
| **2. Use Lighter Model** | |
| ```bash | |
| # Edit .env: | |
| HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Instead of Mixtral-8x7B | |
| ``` | |
| **3. Reduce Request Size** | |
| ```bash | |
| # Edit .env: | |
| MAX_TOKENS_PER_REQUEST=150 | |
| MAX_CHUNK_TOKENS=3000 | |
| ``` | |
| ### If Using LMStudio | |
| **1. Start LMStudio Server** | |
| ```bash | |
| # Open LMStudio | |
| # Go to Server tab | |
| # Start server on http://localhost:1234 | |
| ``` | |
| **2. Load a Lightweight Model** | |
| ```bash | |
| # In LMStudio, load one of: | |
| - Mistral 7B Instruct | |
| - Llama 2 7B Chat | |
| - Phi-2 | |
| # Avoid heavy models: | |
| - β Mixtral 8x7B (too large) | |
| - β Llama 70B (too large) | |
| ``` | |
| **3. Configure .env** | |
| ```bash | |
| LLM_BACKEND=lmstudio | |
| LM_STUDIO_URL=http://localhost:1234 | |
| ``` | |
| --- | |
| ## π Monitoring During Execution | |
| The enhanced version now prints progress: | |
| ``` | |
| [Summary] Generating cross-transcript summary... | |
| [Summary] Note: This may take 30-60 seconds for large datasets | |
| [LLM] Starting summary generation... | |
| [LLM] Timeout limit: 60s | |
| [LLM] β Completed successfully | |
| [Summary] β Validation passed (score: 0.85) | |
| ``` | |
| Watch for these messages: | |
| - β `Completed successfully` - All good | |
| - β `Timeout after 60s` - Fallback activated | |
| - β `Using emergency fallback` - LLM completely unavailable | |
| --- | |
| ## π What Happens Now vs Before | |
| ### BEFORE (Hanging Behavior) | |
| ``` | |
| Processing transcripts... β | |
| Extracting data... β | |
| Generating summary... | |
| [Waits indefinitely] | |
| [Node.js crashes] | |
| [No output] | |
| ``` | |
| ### AFTER (Graceful Degradation) | |
| ``` | |
| Processing transcripts... β | |
| Extracting data... β | |
| Generating summary... | |
| [LLM] Starting summary generation... | |
| [LLM] Timeout limit: 60s | |
| [LLM] β Timeout after 60s | |
| [LLM] Generating lightweight fallback... | |
| [Summary] Using fallback summary | |
| β Report generated with preserved data | |
| ``` | |
| --- | |
| ## π Testing the Fix | |
| ### Test 1: Verify Timeout Works | |
| ```bash | |
| cd /home/john/TranscriptorEnhanced | |
| # This should complete in <60s or fallback gracefully | |
| python3 -c " | |
| from llm_robust import query_llm_with_timeout | |
| result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10) | |
| print('Success!' if result else 'Failed') | |
| " | |
| ``` | |
| ### Test 2: Full End-to-End | |
| ```bash | |
| # Process a small transcript to verify | |
| ./start.sh | |
| # Upload 1 transcript through UI | |
| # Should complete in <2 minutes total | |
| ``` | |
| --- | |
| ## π¨ If Still Having Issues | |
| ### 1. Completely Bypass LLM (Emergency Mode) | |
| Edit `/home/john/TranscriptorEnhanced/.env`: | |
| ```bash | |
| # Force all LLM calls to use lightweight fallback | |
| LLM_TIMEOUT=1 # 1 second timeout forces immediate fallback | |
| ``` | |
| This will: | |
| - Skip LLM processing entirely | |
| - Use pattern-based extraction only | |
| - Generate reports from structured data | |
| - Complete in seconds instead of minutes | |
| ### 2. Process One Transcript at a Time | |
| Instead of batch processing, process individually through the UI. | |
| ### 3. Check System Resources | |
| ```bash | |
| # Check available memory | |
| free -h | |
| # Check running processes | |
| ps aux | grep -i "python\|node\|lmstudio" | |
| # Kill stuck processes | |
| pkill -f "python app.py" | |
| pkill -f lmstudio | |
| ``` | |
| --- | |
| ## β Summary of Fixes | |
| | Issue | Fix Applied | File | | |
| |-------|-------------|------| | |
| | Indefinite hangs | 60s hard timeout | `llm_robust.py` | | |
| | No fallback | Lightweight text extraction | `llm_robust.py` | | |
| | Server crashes | Graceful degradation | `app.py` | | |
| | Heavy models | Lighter model recommendation | `.env` | | |
| | No health check | Startup connectivity test | `fix_llm_timeout.py`, `start.sh` | | |
| --- | |
| ## π Support | |
| If issues persist: | |
| 1. **Check logs**: Console output shows exactly where it's failing | |
| 2. **Run diagnostic**: `python3 fix_llm_timeout.py --diagnose` | |
| 3. **Try emergency mode**: Set `LLM_TIMEOUT=1` in `.env` | |
| 4. **Process smaller batches**: 1-5 transcripts at a time | |
| **The system will now always complete**, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability. | |
| --- | |
| **Status:** β Fixes Applied and Ready to Test | |
| **Next Step:** Run `./start.sh` to start with health check | |