Spaces:

empirenexus
/

TranscriptWriting

Sleeping

File size: 9,978 Bytes

93c98b5

# Troubleshooting: DynamicCache 'seen_tokens' Error



## Error Message

```

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
```



## What This Means



This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.



**Impact**:

- Transcripts process but get Quality Score 0.00

- LLM analysis fails for all chunks

- No insights extracted from transcripts

- System still generates outputs but they're empty/error messages



---



## Root Cause



The `transformers` library changed its internal `Cache` implementation between versions:

- **Older versions (< 4.36)**: Used simpler cache without `seen_tokens` attribute

- **Newer versions (>= 4.36)**: Introduced `DynamicCache` with `seen_tokens` attribute

- **Version mismatch**: Code expects one format but library provides another



The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation.



---



## Quick Fix (Applied)



**File**: `llm.py` (lines 460-480)



The code has been updated with:



```python

# Fix for DynamicCache 'seen_tokens' error

outputs = query_llm_local.model.generate(

    **inputs,

    max_new_tokens=max_tokens,

    temperature=temperature,

    do_sample=temperature > 0,

    pad_token_id=query_llm_local.tokenizer.eos_token_id,

    use_cache=False  # ← Disable caching to avoid DynamicCache errors

)

```

**What this does**: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.

**Trade-off**: Slightly slower generation (~10-20%) but avoids the error completely.

---

## Solutions (In Order of Preference)

### Solution 1: Upgrade Transformers Library ✅ **RECOMMENDED**

```bash

pip install --upgrade transformers

```

**Expected version**: 4.36.0 or higher

**Verify installation**:
```bash

python -c "import transformers; print(transformers.__version__)"

```

**Expected output**: `4.36.0` or higher

**Why this works**: Newer versions have the `seen_tokens` attribute properly implemented.

---

### Solution 2: Use HuggingFace API Instead 🚀 **EASIEST**

Instead of running models locally, use HuggingFace's cloud API.

**Advantages**:
- No local model loading (saves RAM)
- Faster processing
- No compatibility issues
- Access to larger, better models

**Setup**:

1. Get a HuggingFace token: https://huggingface.co/settings/tokens
2. Create token with "Read" access
3. Set environment variables:

```bash

export HUGGINGFACE_TOKEN='hf_your_token_here'

export USE_HF_API=True

```

Or in `.env` file:
```

HUGGINGFACE_TOKEN=hf_your_token_here

USE_HF_API=True

```

**Verify**:
```bash

python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"

```

---

### Solution 3: Use LMStudio 🖥️ **BEST FOR OFFLINE**

LMStudio provides a GUI for running local models with better compatibility.

**Advantages**:
- Better compatibility than raw transformers
- Easy model management with GUI
- Local/offline processing
- No API costs

**Setup**:

1. Download LMStudio: https://lmstudio.ai/
2. Install and open LMStudio
3. Download a model (recommended: Phi-3-mini or Mistral-7B)
4. Start the local server:
   - Open LMStudio
   - Go to "Server" tab
   - Click "Start Server"
   - Default: http://localhost:1234

5. Set environment variables:

```bash

export USE_LMSTUDIO=True

export LMSTUDIO_URL=http://localhost:1234

```

Or in `.env` file:
```

USE_LMSTUDIO=True

LMSTUDIO_URL=http://localhost:1234

```

**Verify**:
```bash

curl http://localhost:1234/v1/models

```

Should return JSON with available models.

---

### Solution 4: Use Diagnostic Script

Run the diagnostic script to automatically detect and fix issues:

```bash

python fix_local_model.py

```

This script will:
1. Check your transformers version
2. Test local model functionality
3. Provide specific recommendations
4. Guide you through setup alternatives

**Example output**:
```

==================================================================

Local Model DynamicCache Error Fix

==================================================================



[Step 1] Diagnosing current environment...

✓ Transformers version: 4.35.0

⚠️  Transformers 4.35.0 is outdated

   Recommended: >= 4.36.0



[Step 2] Attempting to fix...

Upgrade transformers library? (y/n): y

✓ Transformers upgraded successfully

✓ Please restart your application

```

---

## Verification Steps

After applying any fix, verify it works:

### Test 1: Check Versions
```bash

python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"

```

**Expected**:
```

Transformers: 4.36.0 or higher

PyTorch: 2.1.0 or higher

```

### Test 2: Quick LLM Test
```bash

python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"

```

**Expected**: Some text output (not an error message)

### Test 3: Full Integration Test
Process a single transcript through the app and check:
- Quality Score > 0.00 ✓
- Structured data extracted ✓
- No DynamicCache errors in logs ✓

---

## Understanding Quality Score 0.00

If you see `Quality Score: 0.00` for all transcripts, it means:

**Cause**: LLM analysis is failing (likely due to this error)

**How Quality Score is calculated** (validation.py):
```python

def validate_transcript_quality(full_text, structured_data, interviewee_type):

    score = 0.0



    # Text length check (0.3 points)

    if len(full_text) > 100: score += 0.3



    # Structured data check (0.4 points)

    if has_structured_data: score += 0.4



    # Specificity check (0.3 points)

    if has_specific_terms: score += 0.3



    return score, issues

```

**If LLM fails**:
- `full_text` = "[Error] Local model failed: ..."
- `structured_data` = {} (empty)
- **Result**: Score = 0.00

**Fix**: Resolve the DynamicCache error → LLM works → Quality Score improves to 0.7-1.0

---

## Prevention & Best Practices

### 1. Pin Dependency Versions
In `requirements.txt`:
```

transformers>=4.36.0,<5.0.0

torch>=2.1.0,<2.3.0

```

**Why**: Ensures compatible versions are installed together

### 2. Use Virtual Environments
```bash

python -m venv venv

source venv/bin/activate  # Linux/Mac

# or

venv\Scripts\activate  # Windows

pip install -r requirements.txt

```

**Why**: Isolates dependencies, prevents conflicts with other projects

### 3. Regular Updates
```bash

pip install --upgrade transformers torch accelerate

```

**When**:
- After any error
- Monthly maintenance
- Before deploying to production

### 4. Prefer Cloud APIs for Production

For production deployments:
- **Use HuggingFace API** for reliability
- **Use LMStudio** for on-premise/offline requirements
- **Avoid local transformers** unless you control the environment

---

## Environment-Specific Notes

### Docker / HuggingFace Spaces
```dockerfile

# In Dockerfile or requirements

RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate

```

### Windows
```powershell

# Install in PowerShell with admin rights

pip install --upgrade transformers torch accelerate

```

### Linux / WSL
```bash

pip3 install --upgrade transformers torch accelerate

```

### macOS
```bash

pip3 install --upgrade transformers torch accelerate

```

---

## Still Having Issues?

### Debug Mode
Enable detailed logging:
```python

import os

os.environ["DEBUG_MODE"] = "True"

```

Then check logs for detailed error messages.

### Check Full Error Stack
Look for the full traceback in console output:
```

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'

Traceback (most recent call last):

  File "llm.py", line 459, in query_llm_local

    outputs = query_llm_local.model.generate(...)

  ...

```

### Contact Support
If the issue persists:
1. Run diagnostic script: `python fix_local_model.py`
2. Capture full logs
3. Note your environment:
   - OS (Windows/Linux/Mac)
   - Python version
   - Transformers version
   - PyTorch version
4. Report issue with logs

---

## Summary Checklist

- [ ] Updated transformers: `pip install --upgrade transformers`
- [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"`
- [ ] Applied code fix (use_cache=False) - already done in llm.py

- [ ] Tested with sample transcript

- [ ] Quality Score > 0.00 ✓

- [ ] OR: Switched to HF API / LMStudio instead



**If all checked**: ✓ Problem solved!



**If still failing**: Use HF API or LMStudio (Solutions 2-3 above)



---



## Related Files



- `llm.py` - Contains the fix (lines 460-480)

- `fix_local_model.py` - Diagnostic script

- `requirements.txt` - Dependency versions

- `ENHANCEMENTS.md` - Recent improvements documentation



---



## Technical Details (For Developers)



### Why `use_cache=False` Works

**Normal generation with caching**:
```python

# Step 1: Generate token 1

cache = DynamicCache()  # Create cache

cache.seen_tokens = 1   # Track position



# Step 2: Generate token 2

cache.seen_tokens = 2   # Update position

# ... uses previous key/values from cache



# Faster but requires cache.seen_tokens attribute

```

**Generation without caching**:
```python

# Step 1: Generate token 1

# No cache used



# Step 2: Generate token 2

# Recompute everything from scratch



# Slower (~10-20%) but no cache dependencies

```

### Future Improvements

We're monitoring:
- Transformers library updates
- Alternative caching implementations
- Model-specific optimizations

Stay updated: Check `ENHANCEMENTS.md` for latest improvements.