TranscriptWriting / TROUBLESHOOTING_DYNAMIC_CACHE.md
jmisak's picture
Upload 5 files
93c98b5 verified
# Troubleshooting: DynamicCache 'seen_tokens' Error
## Error Message
```
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
```
## What This Means
This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.
**Impact**:
- Transcripts process but get Quality Score 0.00
- LLM analysis fails for all chunks
- No insights extracted from transcripts
- System still generates outputs but they're empty/error messages
---
## Root Cause
The `transformers` library changed its internal `Cache` implementation between versions:
- **Older versions (< 4.36)**: Used simpler cache without `seen_tokens` attribute
- **Newer versions (>= 4.36)**: Introduced `DynamicCache` with `seen_tokens` attribute
- **Version mismatch**: Code expects one format but library provides another
The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation.
---
## Quick Fix (Applied)
**File**: `llm.py` (lines 460-480)
The code has been updated with:
```python
# Fix for DynamicCache 'seen_tokens' error
outputs = query_llm_local.model.generate(
**inputs,
max_new_tokens=max_tokens,
temperature=temperature,
do_sample=temperature > 0,
pad_token_id=query_llm_local.tokenizer.eos_token_id,
use_cache=False # ← Disable caching to avoid DynamicCache errors
)
```
**What this does**: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.
**Trade-off**: Slightly slower generation (~10-20%) but avoids the error completely.
---
## Solutions (In Order of Preference)
### Solution 1: Upgrade Transformers Library βœ… **RECOMMENDED**
```bash
pip install --upgrade transformers
```
**Expected version**: 4.36.0 or higher
**Verify installation**:
```bash
python -c "import transformers; print(transformers.__version__)"
```
**Expected output**: `4.36.0` or higher
**Why this works**: Newer versions have the `seen_tokens` attribute properly implemented.
---
### Solution 2: Use HuggingFace API Instead πŸš€ **EASIEST**
Instead of running models locally, use HuggingFace's cloud API.
**Advantages**:
- No local model loading (saves RAM)
- Faster processing
- No compatibility issues
- Access to larger, better models
**Setup**:
1. Get a HuggingFace token: https://huggingface.co/settings/tokens
2. Create token with "Read" access
3. Set environment variables:
```bash
export HUGGINGFACE_TOKEN='hf_your_token_here'
export USE_HF_API=True
```
Or in `.env` file:
```
HUGGINGFACE_TOKEN=hf_your_token_here
USE_HF_API=True
```
**Verify**:
```bash
python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"
```
---
### Solution 3: Use LMStudio πŸ–₯️ **BEST FOR OFFLINE**
LMStudio provides a GUI for running local models with better compatibility.
**Advantages**:
- Better compatibility than raw transformers
- Easy model management with GUI
- Local/offline processing
- No API costs
**Setup**:
1. Download LMStudio: https://lmstudio.ai/
2. Install and open LMStudio
3. Download a model (recommended: Phi-3-mini or Mistral-7B)
4. Start the local server:
- Open LMStudio
- Go to "Server" tab
- Click "Start Server"
- Default: http://localhost:1234
5. Set environment variables:
```bash
export USE_LMSTUDIO=True
export LMSTUDIO_URL=http://localhost:1234
```
Or in `.env` file:
```
USE_LMSTUDIO=True
LMSTUDIO_URL=http://localhost:1234
```
**Verify**:
```bash
curl http://localhost:1234/v1/models
```
Should return JSON with available models.
---
### Solution 4: Use Diagnostic Script
Run the diagnostic script to automatically detect and fix issues:
```bash
python fix_local_model.py
```
This script will:
1. Check your transformers version
2. Test local model functionality
3. Provide specific recommendations
4. Guide you through setup alternatives
**Example output**:
```
==================================================================
Local Model DynamicCache Error Fix
==================================================================
[Step 1] Diagnosing current environment...
βœ“ Transformers version: 4.35.0
⚠️ Transformers 4.35.0 is outdated
Recommended: >= 4.36.0
[Step 2] Attempting to fix...
Upgrade transformers library? (y/n): y
βœ“ Transformers upgraded successfully
βœ“ Please restart your application
```
---
## Verification Steps
After applying any fix, verify it works:
### Test 1: Check Versions
```bash
python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"
```
**Expected**:
```
Transformers: 4.36.0 or higher
PyTorch: 2.1.0 or higher
```
### Test 2: Quick LLM Test
```bash
python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"
```
**Expected**: Some text output (not an error message)
### Test 3: Full Integration Test
Process a single transcript through the app and check:
- Quality Score > 0.00 βœ“
- Structured data extracted βœ“
- No DynamicCache errors in logs βœ“
---
## Understanding Quality Score 0.00
If you see `Quality Score: 0.00` for all transcripts, it means:
**Cause**: LLM analysis is failing (likely due to this error)
**How Quality Score is calculated** (validation.py):
```python
def validate_transcript_quality(full_text, structured_data, interviewee_type):
score = 0.0
# Text length check (0.3 points)
if len(full_text) > 100: score += 0.3
# Structured data check (0.4 points)
if has_structured_data: score += 0.4
# Specificity check (0.3 points)
if has_specific_terms: score += 0.3
return score, issues
```
**If LLM fails**:
- `full_text` = "[Error] Local model failed: ..."
- `structured_data` = {} (empty)
- **Result**: Score = 0.00
**Fix**: Resolve the DynamicCache error β†’ LLM works β†’ Quality Score improves to 0.7-1.0
---
## Prevention & Best Practices
### 1. Pin Dependency Versions
In `requirements.txt`:
```
transformers>=4.36.0,<5.0.0
torch>=2.1.0,<2.3.0
```
**Why**: Ensures compatible versions are installed together
### 2. Use Virtual Environments
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
**Why**: Isolates dependencies, prevents conflicts with other projects
### 3. Regular Updates
```bash
pip install --upgrade transformers torch accelerate
```
**When**:
- After any error
- Monthly maintenance
- Before deploying to production
### 4. Prefer Cloud APIs for Production
For production deployments:
- **Use HuggingFace API** for reliability
- **Use LMStudio** for on-premise/offline requirements
- **Avoid local transformers** unless you control the environment
---
## Environment-Specific Notes
### Docker / HuggingFace Spaces
```dockerfile
# In Dockerfile or requirements
RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate
```
### Windows
```powershell
# Install in PowerShell with admin rights
pip install --upgrade transformers torch accelerate
```
### Linux / WSL
```bash
pip3 install --upgrade transformers torch accelerate
```
### macOS
```bash
pip3 install --upgrade transformers torch accelerate
```
---
## Still Having Issues?
### Debug Mode
Enable detailed logging:
```python
import os
os.environ["DEBUG_MODE"] = "True"
```
Then check logs for detailed error messages.
### Check Full Error Stack
Look for the full traceback in console output:
```
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
Traceback (most recent call last):
File "llm.py", line 459, in query_llm_local
outputs = query_llm_local.model.generate(...)
...
```
### Contact Support
If the issue persists:
1. Run diagnostic script: `python fix_local_model.py`
2. Capture full logs
3. Note your environment:
- OS (Windows/Linux/Mac)
- Python version
- Transformers version
- PyTorch version
4. Report issue with logs
---
## Summary Checklist
- [ ] Updated transformers: `pip install --upgrade transformers`
- [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"`
- [ ] Applied code fix (use_cache=False) - already done in llm.py
- [ ] Tested with sample transcript
- [ ] Quality Score > 0.00 βœ“
- [ ] OR: Switched to HF API / LMStudio instead
**If all checked**: βœ“ Problem solved!
**If still failing**: Use HF API or LMStudio (Solutions 2-3 above)
---
## Related Files
- `llm.py` - Contains the fix (lines 460-480)
- `fix_local_model.py` - Diagnostic script
- `requirements.txt` - Dependency versions
- `ENHANCEMENTS.md` - Recent improvements documentation
---
## Technical Details (For Developers)
### Why `use_cache=False` Works
**Normal generation with caching**:
```python
# Step 1: Generate token 1
cache = DynamicCache() # Create cache
cache.seen_tokens = 1 # Track position
# Step 2: Generate token 2
cache.seen_tokens = 2 # Update position
# ... uses previous key/values from cache
# Faster but requires cache.seen_tokens attribute
```
**Generation without caching**:
```python
# Step 1: Generate token 1
# No cache used
# Step 2: Generate token 2
# Recompute everything from scratch
# Slower (~10-20%) but no cache dependencies
```
### Future Improvements
We're monitoring:
- Transformers library updates
- Alternative caching implementations
- Model-specific optimizations
Stay updated: Check `ENHANCEMENTS.md` for latest improvements.