Spaces:

empirenexus
/

TranscriptWriting

Sleeping

File size: 10,484 Bytes

52d0298

# Troubleshooting: LLM Timeout & Node.js Server Crashes

## Problem: App Hangs During Summarization / Node.js Server Stops

### Symptoms
- ✗ Application stops responding during "summarizing" phase
- ✗ Node.js server process terminates
- ✗ No error message, just hangs indefinitely
- ✗ Model loading takes forever or never completes

---

## ✅ IMMEDIATE FIX (Already Applied)

The enhanced version now includes:

1. **Aggressive Timeout Protection** (`llm_robust.py`)
   - Hard 60-second timeout (down from 120s)
   - Automatic fallback to lightweight processing
   - Emergency text-based analysis if LLM fails

2. **Optimized Configuration** (`.env` file created)
   - Lighter model recommendation (Mistral-7B vs Mixtral-8x7B)
   - Reduced token requirements (200 vs 300)
   - Faster failure detection

3. **Startup Health Check** (`start.sh` script)
   - Tests LLM connectivity before processing
   - Warns about configuration issues
   - Prevents hanging before it starts

---

## 🚀 Quick Start (Using Fixed Version)

### Option 1: Use Startup Script (Recommended)
```bash

cd /home/john/TranscriptorEnhanced



# Edit .env and add your HuggingFace token

nano .env



# Start with health check

./start.sh

```

### Option 2: Manual Start with Health Check
```bash

cd /home/john/TranscriptorEnhanced



# Test connectivity first

python3 fix_llm_timeout.py --test



# If test passes, start app

source .env

python3 app.py

```

---

## 🔧 Configuration Options

### .env File (Already Created)

```bash

# Option A: Use HuggingFace API (Most Stable - RECOMMENDED)

LLM_BACKEND=hf_api

HUGGINGFACE_TOKEN=your_token_here  # ← ADD YOUR TOKEN HERE

HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2  # Lighter model



# Option B: Use LMStudio (Local - if you have it running)

LLM_BACKEND=lmstudio

LM_STUDIO_URL=http://localhost:1234



# Timeout Settings (Prevents Hanging)

LLM_TIMEOUT=60  # Hard timeout at 60 seconds

MAX_TOKENS_PER_REQUEST=200  # Reduced for speed

```

---

## 📋 Diagnostics

### Run Full Diagnostic
```bash

cd /home/john/TranscriptorEnhanced

python3 fix_llm_timeout.py --diagnose

```

### Test LLM Connectivity
```bash

python3 fix_llm_timeout.py --test

```

### Check Current Configuration
```bash

python3 fix_llm_timeout.py --config

```

---

## 🔍 Root Cause Analysis

### Why It Hangs

**1. Large Model + Limited Memory**
- Mixtral-8x7B requires ~30GB RAM
- Loading model exhausts memory
- Node.js/Python process killed by OS

**2. Network Timeouts**
- HuggingFace API unreachable
- Slow network connection
- Rate limiting

**3. Server Overload**
- Multiple concurrent requests
- LMStudio running out of resources
- GPU memory exhaustion

---

## ✅ Solutions Applied

### 1. Timeout Protection (`llm_robust.py`)



**Before:**

```python

# Waits indefinitely if model hangs

summary = query_llm(prompt, ...)
```



**After:**

```python

# Times out after 60s, uses fallback

with timeout(60):

    summary = query_llm(prompt, ...)

# Falls back to lightweight text extraction if timeout

```

### 2. Lightweight Fallbacks

If LLM times out, the system now:
1. Extracts data from the prompt text itself
2. Generates a lightweight summary with preserved data
3. Continues processing instead of crashing
4. Creates a report noting the limitation

**Example Fallback Output:**
```

LIGHTWEIGHT SUMMARY REPORT

(Generated due to LLM timeout - data extracted from available information)



SAMPLE OVERVIEW:

Total Patient interviews analyzed: 12



KEY OBSERVATIONS:

This analysis is based on structured data extraction rather than full LLM synthesis.



DATA EXTRACTED:

- Structured data preserved in CSV

- Individual transcript analyses completed

- Quantitative data available



RECOMMENDATIONS:

1. Reduce batch size (process fewer transcripts at once)

2. Verify LLM server connectivity

3. Consider lighter model (Mistral-7B vs Mixtral-8x7B)

```

### 3. Progressive Timeout Strategy

```

┌──────────────────────────────────────┐

│ Attempt 1: Full LLM (60s timeout)   │

└──────────┬───────────────────────────┘

           │

           ├─ Success → Continue normally

           │

           └─ Timeout → Fallback

                        ↓

           ┌──────────────────────────────────────┐

           │ Attempt 2: Lightweight extraction    │

           │ (Pattern-based, no LLM)              │

           └──────────┬───────────────────────────┘

                      │

                      ├─ Success → Continue with warning

                      │

                      └─ Failure → Emergency fallback

                                  ↓

                      ┌──────────────────────────────────────┐

                      │ Emergency: Preserve data only        │

                      │ (CSV export, minimal summary)        │

                      └──────────────────────────────────────┘

```

---

## 🎯 Recommended Settings by Use Case

### Small Datasets (1-5 transcripts)
```bash

LLM_BACKEND=hf_api

HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2

LLM_TIMEOUT=90

MAX_TOKENS_PER_REQUEST=300

```

### Medium Datasets (6-15 transcripts)
```bash

LLM_BACKEND=hf_api

HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2

LLM_TIMEOUT=60

MAX_TOKENS_PER_REQUEST=200

```

### Large Datasets (15+ transcripts) - Process in Batches
```bash

LLM_BACKEND=hf_api

HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2

LLM_TIMEOUT=45

MAX_TOKENS_PER_REQUEST=150



# Process in batches of 10 transcripts max

```

---

## 🛠️ Manual Fixes

### If HuggingFace API is slow/timing out

**1. Get a HuggingFace Token**
```bash

# Visit: https://huggingface.co/settings/tokens

# Create a token

# Add to .env:

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

```

**2. Use Lighter Model**
```bash

# Edit .env:

HF_MODEL=mistralai/Mistral-7B-Instruct-v0.2  # Instead of Mixtral-8x7B

```

**3. Reduce Request Size**
```bash

# Edit .env:

MAX_TOKENS_PER_REQUEST=150

MAX_CHUNK_TOKENS=3000

```

### If Using LMStudio

**1. Start LMStudio Server**
```bash

# Open LMStudio

# Go to Server tab

# Start server on http://localhost:1234

```

**2. Load a Lightweight Model**
```bash

# In LMStudio, load one of:

- Mistral 7B Instruct

- Llama 2 7B Chat

- Phi-2



# Avoid heavy models:

- ✗ Mixtral 8x7B (too large)

- ✗ Llama 70B (too large)

```

**3. Configure .env**
```bash

LLM_BACKEND=lmstudio

LM_STUDIO_URL=http://localhost:1234

```

---

## 📊 Monitoring During Execution

The enhanced version now prints progress:

```

[Summary] Generating cross-transcript summary...

[Summary] Note: This may take 30-60 seconds for large datasets

[LLM] Starting summary generation...

[LLM] Timeout limit: 60s

[LLM] ✓ Completed successfully

[Summary] ✓ Validation passed (score: 0.85)

```

Watch for these messages:
- ✓ `Completed successfully` - All good
- ⚠ `Timeout after 60s` - Fallback activated
- ✗ `Using emergency fallback` - LLM completely unavailable

---

## 🔄 What Happens Now vs Before

### BEFORE (Hanging Behavior)

```

Processing transcripts... ✓

Extracting data... ✓

Generating summary...

[Waits indefinitely]

[Node.js crashes]

[No output]

```

### AFTER (Graceful Degradation)

```

Processing transcripts... ✓

Extracting data... ✓

Generating summary...

[LLM] Starting summary generation...

[LLM] Timeout limit: 60s

[LLM] ✗ Timeout after 60s

[LLM] Generating lightweight fallback...

[Summary] Using fallback summary

✓ Report generated with preserved data

```

---

## 📝 Testing the Fix

### Test 1: Verify Timeout Works
```bash

cd /home/john/TranscriptorEnhanced



# This should complete in <60s or fallback gracefully

python3 -c "

from llm_robust import query_llm_with_timeout

result = query_llm_with_timeout('Test', '', 'Other', max_timeout=10)

print('Success!' if result else 'Failed')

"

```

### Test 2: Full End-to-End
```bash

# Process a small transcript to verify

./start.sh

# Upload 1 transcript through UI

# Should complete in <2 minutes total

```

---

## 🚨 If Still Having Issues

### 1. Completely Bypass LLM (Emergency Mode)

Edit `/home/john/TranscriptorEnhanced/.env`:
```bash

# Force all LLM calls to use lightweight fallback

LLM_TIMEOUT=1  # 1 second timeout forces immediate fallback

```

This will:
- Skip LLM processing entirely
- Use pattern-based extraction only
- Generate reports from structured data
- Complete in seconds instead of minutes

### 2. Process One Transcript at a Time
Instead of batch processing, process individually through the UI.

### 3. Check System Resources
```bash

# Check available memory

free -h



# Check running processes

ps aux | grep -i "python\|node\|lmstudio"



# Kill stuck processes

pkill -f "python app.py"

pkill -f lmstudio

```

---

## ✅ Summary of Fixes

| Issue | Fix Applied | File |
|-------|-------------|------|
| Indefinite hangs | 60s hard timeout | `llm_robust.py` |
| No fallback | Lightweight text extraction | `llm_robust.py` |
| Server crashes | Graceful degradation | `app.py` |
| Heavy models | Lighter model recommendation | `.env` |
| No health check | Startup connectivity test | `fix_llm_timeout.py`, `start.sh` |

---

## 📞 Support

If issues persist:

1. **Check logs**: Console output shows exactly where it's failing
2. **Run diagnostic**: `python3 fix_llm_timeout.py --diagnose`
3. **Try emergency mode**: Set `LLM_TIMEOUT=1` in `.env`
4. **Process smaller batches**: 1-5 transcripts at a time

**The system will now always complete**, even if it has to fall back to lightweight processing. You'll get a report with preserved data regardless of LLM availability.

---

**Status:** ✅ Fixes Applied and Ready to Test
**Next Step:** Run `./start.sh` to start with health check