TranscriptWriting / TROUBLESHOOTING_LLM_TIMEOUT.md
jmisak's picture
Upload 57 files
52d0298 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Troubleshooting: LLM Timeout & Node.js Server Crashes

Problem: App Hangs During Summarization / Node.js Server Stops

Symptoms

  • βœ— Application stops responding during "summarizing" phase
  • βœ— Node.js server process terminates
  • βœ— No error message, just hangs indefinitely
  • βœ— Model loading takes forever or never completes

βœ… IMMEDIATE FIX (Already Applied)

The enhanced version now includes:

  1. Aggressive Timeout Protection (llm_robust.py)

    • Hard 60-second timeout (down from 120s)
    • Automatic fallback to lightweight processing
    • Emergency text-based analysis if LLM fails
  2. Optimized Configuration (.env file created)

    • Lighter model recommendation (Mistral-7B vs Mixtral-8x7B)
    • Reduced token requirements (200 vs 300)
    • Faster failure detection
  3. Startup Health Check (start.sh script)

    • Tests LLM connectivity before processing
    • Warns about configuration issues
    • Prevents hanging before it starts

πŸš€ Quick Start (Using Fixed Version)

Option 1: Use Startup Script (Recommended)

cd /home/john/TranscriptorEnhanced

# Edit .env and add your HuggingFace token
nano .env

# Start with health check
./start.sh

Option 2: Manual Start with Health Check

cd /home/john/TranscriptorEnhanced

# Test connectivity first
python3 fix_llm_timeout.py --test

# If test passes, start app
source .env
python3 app.py

πŸ”§ Configuration Options

.env File (Already Created)

# Option A: Use HuggingFace API (Most Stable - RECOMMENDED)
LLM_BACKEND=hf_api
HUGGINGFACE_TOKEN=your_token_here  # ← ADD YOUR TOKEN HERE
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2  # Lighter model

# Option B: Use LMStudio (Local - if you have it running)
LLM_BACKEND=lmstudio
LM_STUDIO_URL=http://localhost:1234

# Timeout Settings (Prevents Hanging)
LLM_TIMEOUT=60  # Hard timeout at 60 seconds
MAX_TOKENS_PER_REQUEST=200  # Reduced for speed

πŸ“‹ Diagnostics

Run Full Diagnostic

cd /home/john/TranscriptorEnhanced
python3 fix_llm_timeout.py --diagnose

Test LLM Connectivity

python3 fix_llm_timeout.py --test

Check Current Configuration

python3 fix_llm_timeout.py --config

πŸ” Root Cause Analysis

Why It Hangs

1. Large Model + Limited Memory

  • Mixtral-8x7B requires ~30GB RAM
  • Loading model exhausts memory
  • Node.js/Python process killed by OS

2. Network Timeouts

  • HuggingFace API unreachable
  • Slow network connection
  • Rate limiting

3. Server Overload

  • Multiple concurrent requests
  • LMStudio running out of resources
  • GPU memory exhaustion

βœ… Solutions Applied

1. Timeout Protection (llm_robust.py)

Before:

# Waits indefinitely if model hangs
summary = query_llm(prompt, ...)

After:

# Times out after 60s, uses fallback
with timeout(60):
    summary = query_llm(prompt, ...)
# Falls back to lightweight text extraction if timeout

2. Lightweight Fallbacks

If LLM times out, the system now:

  1. Extracts data from the prompt text itself
  2. Generates a lightweight summary with preserved data
  3. Continues processing instead of crashing
  4. Creates a report noting the limitation

Example Fallback Output:

LIGHTWEIGHT SUMMARY REPORT
(Generated due to LLM timeout - data extracted from available information)

SAMPLE OVERVIEW:
Total Patient interviews analyzed: 12

KEY OBSERVATIONS:
This analysis is based on structured data extraction rather than full LLM synthesis.

DATA EXTRACTED:
- Structured data preserved in CSV
- Individual transcript analyses completed
- Quantitative data available

RECOMMENDATIONS:
1. Reduce batch size (process fewer transcripts at once)
2. Verify LLM server connectivity
3. Consider lighter model (Mistral-7B vs Mixtral-8x7B)

3. Progressive Timeout Strategy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Attempt 1: Full LLM (60s timeout)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”œβ”€ Success β†’ Continue normally
           β”‚
           └─ Timeout β†’ Fallback
                        ↓
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚ Attempt 2: Lightweight extraction    β”‚
           β”‚ (Pattern-based, no LLM)              β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β”œβ”€ Success β†’ Continue with warning
                      β”‚
                      └─ Failure β†’ Emergency fallback
                                  ↓
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ Emergency: Preserve data only        β”‚
                      β”‚ (CSV export, minimal summary)        β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Recommended Settings by Use Case

Small Datasets (1-5 transcripts)

LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=90
MAX_TOKENS_PER_REQUEST=300

Medium Datasets (6-15 transcripts)

LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=60
MAX_TOKENS_PER_REQUEST=200

Large Datasets (15+ transcripts) - Process in Batches

LLM_BACKEND=hf_api
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_TIMEOUT=45
MAX_TOKENS_PER_REQUEST=150

# Process in batches of 10 transcripts max

πŸ› οΈ Manual Fixes

If HuggingFace API is slow/timing out

1. Get a HuggingFace Token

# Visit: https://huggingface.co/settings/tokens
# Create a token
# Add to .env:
HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

2. Use Lighter Model

# Edit .env:
HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2  # Instead of Mixtral-8x7B

3. Reduce Request Size

# Edit .env:
MAX_TOKENS_PER_REQUEST=150
MAX_CHUNK_TOKENS=3000

If Using LMStudio

1. Start LMStudio Server

# Open LMStudio
# Go to Server tab
# Start server on http://localhost:1234

2. Load a Lightweight Model

# In LMStudio, load one of:
- Mistral 7B Instruct
- Llama 2 7B Chat
- Phi-2

# Avoid heavy models:
- βœ— Mixtral 8x7B (too large)
- βœ— Llama 70B (too large)

3. Configure .env

LLM_BACKEND=lmstudio
LM_STUDIO_URL=http://localhost:1234

πŸ“Š Monitoring During Execution

The enhanced version now prints progress:

[Summary] Generating cross-transcript summary...
[Summary] Note: This may take 30-60 seconds for large datasets
[LLM] Starting summary generation...
[LLM] Timeout limit: 60s
[LLM] βœ“ Completed successfully
[Summary] βœ“ Validation passed (score: 0.85)

Watch for these messages:

  • βœ“ Completed successfully - All good
  • ⚠ Timeout after 60s - Fallback activated
  • βœ— Using emergency fallback - LLM completely unavailable

πŸ”„ What Happens Now vs Before

BEFORE (Hanging Behavior)

Processing transcripts... βœ“
Extracting data... βœ“
Generating summary...
[Waits indefinitely]
[Node.js crashes]
[No output]

AFTER (Graceful Degradation)

Processing transcripts... βœ“
Extracting data... βœ“
Generating summary...
[LLM] Starting summary generation...
[LLM] Timeout limit: 60s
[LLM] βœ— Timeout after 60s
[LLM] Generating lightweight fallback...
[Summary] Using fallback summary
βœ“ Report generated with preserved data

πŸ“ Testing the Fix

Test 1: Verify Timeout Works

cd /home/john/TranscriptorEnhanced

# This should complete in <60s or fallback gracefully
python3 -c "
from llm_robust import query_llm_with_timeout
result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10)
print('Success!' if result else 'Failed')
"

Test 2: Full End-to-End

# Process a small transcript to verify
./start.sh
# Upload 1 transcript through UI
# Should complete in <2 minutes total

🚨 If Still Having Issues

1. Completely Bypass LLM (Emergency Mode)

Edit /home/john/TranscriptorEnhanced/.env:

# Force all LLM calls to use lightweight fallback
LLM_TIMEOUT=1  # 1 second timeout forces immediate fallback

This will:

  • Skip LLM processing entirely
  • Use pattern-based extraction only
  • Generate reports from structured data
  • Complete in seconds instead of minutes

2. Process One Transcript at a Time

Instead of batch processing, process individually through the UI.

3. Check System Resources

# Check available memory
free -h

# Check running processes
ps aux | grep -i "python\|node\|lmstudio"

# Kill stuck processes
pkill -f "python app.py"
pkill -f lmstudio

βœ… Summary of Fixes

Issue Fix Applied File
Indefinite hangs 60s hard timeout llm_robust.py
No fallback Lightweight text extraction llm_robust.py
Server crashes Graceful degradation app.py
Heavy models Lighter model recommendation .env
No health check Startup connectivity test fix_llm_timeout.py, start.sh

πŸ“ž Support

If issues persist:

  1. Check logs: Console output shows exactly where it's failing
  2. Run diagnostic: python3 fix_llm_timeout.py --diagnose
  3. Try emergency mode: Set LLM_TIMEOUT=1 in .env
  4. Process smaller batches: 1-5 transcripts at a time

The system will now always complete, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability.


Status: βœ… Fixes Applied and Ready to Test Next Step: Run ./start.sh to start with health check