# Troubleshooting: DynamicCache 'seen_tokens' Error

## Error Message
```
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
```

## What This Means

This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.

**Impact**:
- Transcripts process but get Quality Score 0.00
- LLM analysis fails for all chunks
- No insights extracted from transcripts
- System still generates outputs but they're empty/error messages

---

## Root Cause

The `transformers` library changed its internal `Cache` implementation between versions:
- **Older versions (< 4.36)**: Used simpler cache without `seen_tokens` attribute
- **Newer versions (>= 4.36)**: Introduced `DynamicCache` with `seen_tokens` attribute
- **Version mismatch**: Code expects one format but library provides another

The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation.

---

## Quick Fix (Applied)

**File**: `llm.py` (lines 460-480)

The code has been updated with:

```python
# Fix for DynamicCache 'seen_tokens' error
outputs = query_llm_local.model.generate(
    **inputs,
    max_new_tokens=max_tokens,
    temperature=temperature,
    do_sample=temperature > 0,
    pad_token_id=query_llm_local.tokenizer.eos_token_id,
    use_cache=False  # ← Disable caching to avoid DynamicCache errors
)
```

**What this does**: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.

**Trade-off**: Slightly slower generation (~10-20%) but avoids the error completely.

---

## Solutions (In Order of Preference)

### Solution 1: Upgrade Transformers Library ✅ **RECOMMENDED**

```bash
pip install --upgrade transformers
```

**Expected version**: 4.36.0 or higher

**Verify installation**:
```bash
python -c "import transformers; print(transformers.__version__)"
```

**Expected output**: `4.36.0` or higher

**Why this works**: Newer versions have the `seen_tokens` attribute properly implemented.

---

### Solution 2: Use HuggingFace API Instead 🚀 **EASIEST**

Instead of running models locally, use HuggingFace's cloud API.

**Advantages**:
- No local model loading (saves RAM)
- Faster processing
- No compatibility issues
- Access to larger, better models

**Setup**:

1. Get a HuggingFace token: https://huggingface.co/settings/tokens
2. Create token with "Read" access
3. Set environment variables:

```bash
export HUGGINGFACE_TOKEN='hf_your_token_here'
export USE_HF_API=True
```

Or in `.env` file:
```
HUGGINGFACE_TOKEN=hf_your_token_here
USE_HF_API=True
```

**Verify**:
```bash
python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"
```

---

### Solution 3: Use LMStudio 🖥️ **BEST FOR OFFLINE**

LMStudio provides a GUI for running local models with better compatibility.

**Advantages**:
- Better compatibility than raw transformers
- Easy model management with GUI
- Local/offline processing
- No API costs

**Setup**:

1. Download LMStudio: https://lmstudio.ai/
2. Install and open LMStudio
3. Download a model (recommended: Phi-3-mini or Mistral-7B)
4. Start the local server:
   - Open LMStudio
   - Go to "Server" tab
   - Click "Start Server"
   - Default: http://localhost:1234

5. Set environment variables:

```bash
export USE_LMSTUDIO=True
export LMSTUDIO_URL=http://localhost:1234
```

Or in `.env` file:
```
USE_LMSTUDIO=True
LMSTUDIO_URL=http://localhost:1234
```

**Verify**:
```bash
curl http://localhost:1234/v1/models
```

Should return JSON with available models.

---

### Solution 4: Use Diagnostic Script

Run the diagnostic script to automatically detect and fix issues:

```bash
python fix_local_model.py
```

This script will:
1. Check your transformers version
2. Test local model functionality
3. Provide specific recommendations
4. Guide you through setup alternatives

**Example output**:
```
==================================================================
Local Model DynamicCache Error Fix
==================================================================

[Step 1] Diagnosing current environment...
✓ Transformers version: 4.35.0
⚠️  Transformers 4.35.0 is outdated
   Recommended: >= 4.36.0

[Step 2] Attempting to fix...
Upgrade transformers library? (y/n): y
✓ Transformers upgraded successfully
✓ Please restart your application
```

---

## Verification Steps

After applying any fix, verify it works:

### Test 1: Check Versions
```bash
python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"
```

**Expected**:
```
Transformers: 4.36.0 or higher
PyTorch: 2.1.0 or higher
```

### Test 2: Quick LLM Test
```bash
python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"
```

**Expected**: Some text output (not an error message)

### Test 3: Full Integration Test
Process a single transcript through the app and check:
- Quality Score > 0.00 ✓
- Structured data extracted ✓
- No DynamicCache errors in logs ✓

---

## Understanding Quality Score 0.00

If you see `Quality Score: 0.00` for all transcripts, it means:

**Cause**: LLM analysis is failing (likely due to this error)

**How Quality Score is calculated** (validation.py):
```python
def validate_transcript_quality(full_text, structured_data, interviewee_type):
    score = 0.0

    # Text length check (0.3 points)
    if len(full_text) > 100: score += 0.3

    # Structured data check (0.4 points)
    if has_structured_data: score += 0.4

    # Specificity check (0.3 points)
    if has_specific_terms: score += 0.3

    return score, issues
```

**If LLM fails**:
- `full_text` = "[Error] Local model failed: ..."
- `structured_data` = {} (empty)
- **Result**: Score = 0.00

**Fix**: Resolve the DynamicCache error → LLM works → Quality Score improves to 0.7-1.0

---

## Prevention & Best Practices

### 1. Pin Dependency Versions
In `requirements.txt`:
```
transformers>=4.36.0,<5.0.0
torch>=2.1.0,<2.3.0
```

**Why**: Ensures compatible versions are installed together

### 2. Use Virtual Environments
```bash
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows
pip install -r requirements.txt
```

**Why**: Isolates dependencies, prevents conflicts with other projects

### 3. Regular Updates
```bash
pip install --upgrade transformers torch accelerate
```

**When**:
- After any error
- Monthly maintenance
- Before deploying to production

### 4. Prefer Cloud APIs for Production

For production deployments:
- **Use HuggingFace API** for reliability
- **Use LMStudio** for on-premise/offline requirements
- **Avoid local transformers** unless you control the environment

---

## Environment-Specific Notes

### Docker / HuggingFace Spaces
```dockerfile
# In Dockerfile or requirements
RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate
```

### Windows
```powershell
# Install in PowerShell with admin rights
pip install --upgrade transformers torch accelerate
```

### Linux / WSL
```bash
pip3 install --upgrade transformers torch accelerate
```

### macOS
```bash
pip3 install --upgrade transformers torch accelerate
```

---

## Still Having Issues?

### Debug Mode
Enable detailed logging:
```python
import os
os.environ["DEBUG_MODE"] = "True"
```

Then check logs for detailed error messages.

### Check Full Error Stack
Look for the full traceback in console output:
```
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
Traceback (most recent call last):
  File "llm.py", line 459, in query_llm_local
    outputs = query_llm_local.model.generate(...)
  ...
```

### Contact Support
If the issue persists:
1. Run diagnostic script: `python fix_local_model.py`
2. Capture full logs
3. Note your environment:
   - OS (Windows/Linux/Mac)
   - Python version
   - Transformers version
   - PyTorch version
4. Report issue with logs

---

## Summary Checklist

- [ ] Updated transformers: `pip install --upgrade transformers`
- [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"`
- [ ] Applied code fix (use_cache=False) - already done in llm.py
- [ ] Tested with sample transcript
- [ ] Quality Score > 0.00 ✓
- [ ] OR: Switched to HF API / LMStudio instead

**If all checked**: ✓ Problem solved!

**If still failing**: Use HF API or LMStudio (Solutions 2-3 above)

---

## Related Files

- `llm.py` - Contains the fix (lines 460-480)
- `fix_local_model.py` - Diagnostic script
- `requirements.txt` - Dependency versions
- `ENHANCEMENTS.md` - Recent improvements documentation

---

## Technical Details (For Developers)

### Why `use_cache=False` Works

**Normal generation with caching**:
```python
# Step 1: Generate token 1
cache = DynamicCache()  # Create cache
cache.seen_tokens = 1   # Track position

# Step 2: Generate token 2
cache.seen_tokens = 2   # Update position
# ... uses previous key/values from cache

# Faster but requires cache.seen_tokens attribute
```

**Generation without caching**:
```python
# Step 1: Generate token 1
# No cache used

# Step 2: Generate token 2
# Recompute everything from scratch

# Slower (~10-20%) but no cache dependencies
```

### Future Improvements

We're monitoring:
- Transformers library updates
- Alternative caching implementations
- Model-specific optimizations

Stay updated: Check `ENHANCEMENTS.md` for latest improvements.