Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
Troubleshooting: LLM Timeout & Node.js Server Crashes
Problem: App Hangs During Summarization / Node.js Server Stops
Symptoms
- β Application stops responding during "summarizing" phase
- β Node.js server process terminates
- β No error message, just hangs indefinitely
- β Model loading takes forever or never completes
β IMMEDIATE FIX (Already Applied)
The enhanced version now includes:
Aggressive Timeout Protection (
llm_robust.py)- Hard 60-second timeout (down from 120s)
- Automatic fallback to lightweight processing
- Emergency text-based analysis if LLM fails
Optimized Configuration (
.envfile created)- Lighter model recommendation (Mistral-7B vs Mixtral-8x7B)
- Reduced token requirements (200 vs 300)
- Faster failure detection
Startup Health Check (
start.shscript)- Tests LLM connectivity before processing
- Warns about configuration issues
- Prevents hanging before it starts
π Quick Start (Using Fixed Version)
Option 1: Use Startup Script (Recommended)
cd /home/john/TranscriptorEnhanced
# Edit .env and add your HuggingFace token
nano .env
# Start with health check
./start.sh
Option 2: Manual Start with Health Check
cd /home/john/TranscriptorEnhanced
# Test connectivity first
python3 fix_llm_timeout.py --test
# If test passes, start app
source .env
python3 app.py
π§ Configuration Options
.env File (Already Created)
# Option A: Use HuggingFace API (Most Stable - RECOMMENDED)
LLM_BACKEND=hf_api
HUGGINGFACE_TOKEN=your_token_here # β ADD YOUR TOKEN HERE
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Lighter model
# Option B: Use LMStudio (Local - if you have it running)
LLM_BACKEND=lmstudio
LM_STUDIO_URL=http://localhost:1234
# Timeout Settings (Prevents Hanging)
LLM_TIMEOUT=60 # Hard timeout at 60 seconds
MAX_TOKENS_PER_REQUEST=200 # Reduced for speed
π Diagnostics
Run Full Diagnostic
cd /home/john/TranscriptorEnhanced
python3 fix_llm_timeout.py --diagnose
Test LLM Connectivity
python3 fix_llm_timeout.py --test
Check Current Configuration
python3 fix_llm_timeout.py --config
π Root Cause Analysis
Why It Hangs
1. Large Model + Limited Memory
- Mixtral-8x7B requires ~30GB RAM
- Loading model exhausts memory
- Node.js/Python process killed by OS
2. Network Timeouts
- HuggingFace API unreachable
- Slow network connection
- Rate limiting
3. Server Overload
- Multiple concurrent requests
- LMStudio running out of resources
- GPU memory exhaustion
β Solutions Applied
1. Timeout Protection (llm_robust.py)
Before:
# Waits indefinitely if model hangs
summary = query_llm(prompt, ...)
After:
# Times out after 60s, uses fallback
with timeout(60):
summary = query_llm(prompt, ...)
# Falls back to lightweight text extraction if timeout
2. Lightweight Fallbacks
If LLM times out, the system now:
- Extracts data from the prompt text itself
- Generates a lightweight summary with preserved data
- Continues processing instead of crashing
- Creates a report noting the limitation
Example Fallback Output:
LIGHTWEIGHT SUMMARY REPORT
(Generated due to LLM timeout - data extracted from available information)
SAMPLE OVERVIEW:
Total Patient interviews analyzed: 12
KEY OBSERVATIONS:
This analysis is based on structured data extraction rather than full LLM synthesis.
DATA EXTRACTED:
- Structured data preserved in CSV
- Individual transcript analyses completed
- Quantitative data available
RECOMMENDATIONS:
1. Reduce batch size (process fewer transcripts at once)
2. Verify LLM server connectivity
3. Consider lighter model (Mistral-7B vs Mixtral-8x7B)
3. Progressive Timeout Strategy
ββββββββββββββββββββββββββββββββββββββββ
β Attempt 1: Full LLM (60s timeout) β
ββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββ Success β Continue normally
β
ββ Timeout β Fallback
β
ββββββββββββββββββββββββββββββββββββββββ
β Attempt 2: Lightweight extraction β
β (Pattern-based, no LLM) β
ββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββ Success β Continue with warning
β
ββ Failure β Emergency fallback
β
ββββββββββββββββββββββββββββββββββββββββ
β Emergency: Preserve data only β
β (CSV export, minimal summary) β
ββββββββββββββββββββββββββββββββββββββββ
π― Recommended Settings by Use Case
Small Datasets (1-5 transcripts)
LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=90
MAX_TOKENS_PER_REQUEST=300
Medium Datasets (6-15 transcripts)
LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=60
MAX_TOKENS_PER_REQUEST=200
Large Datasets (15+ transcripts) - Process in Batches
LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=45
MAX_TOKENS_PER_REQUEST=150
# Process in batches of 10 transcripts max
π οΈ Manual Fixes
If HuggingFace API is slow/timing out
1. Get a HuggingFace Token
# Visit: https://huggingface.co/settings/tokens
# Create a token
# Add to .env:
HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
2. Use Lighter Model
# Edit .env:
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Instead of Mixtral-8x7B
3. Reduce Request Size
# Edit .env:
MAX_TOKENS_PER_REQUEST=150
MAX_CHUNK_TOKENS=3000
If Using LMStudio
1. Start LMStudio Server
# Open LMStudio
# Go to Server tab
# Start server on http://localhost:1234
2. Load a Lightweight Model
# In LMStudio, load one of:
- Mistral 7B Instruct
- Llama 2 7B Chat
- Phi-2
# Avoid heavy models:
- β Mixtral 8x7B (too large)
- β Llama 70B (too large)
3. Configure .env
LLM_BACKEND=lmstudio
LM_STUDIO_URL=http://localhost:1234
π Monitoring During Execution
The enhanced version now prints progress:
[Summary] Generating cross-transcript summary...
[Summary] Note: This may take 30-60 seconds for large datasets
[LLM] Starting summary generation...
[LLM] Timeout limit: 60s
[LLM] β Completed successfully
[Summary] β Validation passed (score: 0.85)
Watch for these messages:
- β
Completed successfully- All good - β
Timeout after 60s- Fallback activated - β
Using emergency fallback- LLM completely unavailable
π What Happens Now vs Before
BEFORE (Hanging Behavior)
Processing transcripts... β
Extracting data... β
Generating summary...
[Waits indefinitely]
[Node.js crashes]
[No output]
AFTER (Graceful Degradation)
Processing transcripts... β
Extracting data... β
Generating summary...
[LLM] Starting summary generation...
[LLM] Timeout limit: 60s
[LLM] β Timeout after 60s
[LLM] Generating lightweight fallback...
[Summary] Using fallback summary
β Report generated with preserved data
π Testing the Fix
Test 1: Verify Timeout Works
cd /home/john/TranscriptorEnhanced
# This should complete in <60s or fallback gracefully
python3 -c "
from llm_robust import query_llm_with_timeout
result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10)
print('Success!' if result else 'Failed')
"
Test 2: Full End-to-End
# Process a small transcript to verify
./start.sh
# Upload 1 transcript through UI
# Should complete in <2 minutes total
π¨ If Still Having Issues
1. Completely Bypass LLM (Emergency Mode)
Edit /home/john/TranscriptorEnhanced/.env:
# Force all LLM calls to use lightweight fallback
LLM_TIMEOUT=1 # 1 second timeout forces immediate fallback
This will:
- Skip LLM processing entirely
- Use pattern-based extraction only
- Generate reports from structured data
- Complete in seconds instead of minutes
2. Process One Transcript at a Time
Instead of batch processing, process individually through the UI.
3. Check System Resources
# Check available memory
free -h
# Check running processes
ps aux | grep -i "python\|node\|lmstudio"
# Kill stuck processes
pkill -f "python app.py"
pkill -f lmstudio
β Summary of Fixes
| Issue | Fix Applied | File |
|---|---|---|
| Indefinite hangs | 60s hard timeout | llm_robust.py |
| No fallback | Lightweight text extraction | llm_robust.py |
| Server crashes | Graceful degradation | app.py |
| Heavy models | Lighter model recommendation | .env |
| No health check | Startup connectivity test | fix_llm_timeout.py, start.sh |
π Support
If issues persist:
- Check logs: Console output shows exactly where it's failing
- Run diagnostic:
python3 fix_llm_timeout.py --diagnose - Try emergency mode: Set
LLM_TIMEOUT=1in.env - Process smaller batches: 1-5 transcripts at a time
The system will now always complete, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability.
Status: β
Fixes Applied and Ready to Test
Next Step: Run ./start.sh to start with health check