Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
Troubleshooting: DynamicCache 'seen_tokens' Error
Error Message
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
What This Means
This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the transformers library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.
Impact:
- Transcripts process but get Quality Score 0.00
- LLM analysis fails for all chunks
- No insights extracted from transcripts
- System still generates outputs but they're empty/error messages
Root Cause
The transformers library changed its internal Cache implementation between versions:
- Older versions (< 4.36): Used simpler cache without
seen_tokensattribute - Newer versions (>= 4.36): Introduced
DynamicCachewithseen_tokensattribute - Version mismatch: Code expects one format but library provides another
The error specifically occurs during the model.generate() call when the library tries to manage the key-value cache for efficient generation.
Quick Fix (Applied)
File: llm.py (lines 460-480)
The code has been updated with:
# Fix for DynamicCache 'seen_tokens' error
outputs = query_llm_local.model.generate(
**inputs,
max_new_tokens=max_tokens,
temperature=temperature,
do_sample=temperature > 0,
pad_token_id=query_llm_local.tokenizer.eos_token_id,
use_cache=False # β Disable caching to avoid DynamicCache errors
)
What this does: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.
Trade-off: Slightly slower generation (~10-20%) but avoids the error completely.
Solutions (In Order of Preference)
Solution 1: Upgrade Transformers Library β RECOMMENDED
pip install --upgrade transformers
Expected version: 4.36.0 or higher
Verify installation:
python -c "import transformers; print(transformers.__version__)"
Expected output: 4.36.0 or higher
Why this works: Newer versions have the seen_tokens attribute properly implemented.
Solution 2: Use HuggingFace API Instead π EASIEST
Instead of running models locally, use HuggingFace's cloud API.
Advantages:
- No local model loading (saves RAM)
- Faster processing
- No compatibility issues
- Access to larger, better models
Setup:
- Get a HuggingFace token: https://huggingface.co/settings/tokens
- Create token with "Read" access
- Set environment variables:
export HUGGINGFACE_TOKEN='hf_your_token_here'
export USE_HF_API=True
Or in .env file:
HUGGINGFACE_TOKEN=hf_your_token_here
USE_HF_API=True
Verify:
python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"
Solution 3: Use LMStudio π₯οΈ BEST FOR OFFLINE
LMStudio provides a GUI for running local models with better compatibility.
Advantages:
- Better compatibility than raw transformers
- Easy model management with GUI
- Local/offline processing
- No API costs
Setup:
Download LMStudio: https://lmstudio.ai/
Install and open LMStudio
Download a model (recommended: Phi-3-mini or Mistral-7B)
Start the local server:
- Open LMStudio
- Go to "Server" tab
- Click "Start Server"
- Default: http://localhost:1234
Set environment variables:
export USE_LMSTUDIO=True
export LMSTUDIO_URL=http://localhost:1234
Or in .env file:
USE_LMSTUDIO=True
LMSTUDIO_URL=http://localhost:1234
Verify:
curl http://localhost:1234/v1/models
Should return JSON with available models.
Solution 4: Use Diagnostic Script
Run the diagnostic script to automatically detect and fix issues:
python fix_local_model.py
This script will:
- Check your transformers version
- Test local model functionality
- Provide specific recommendations
- Guide you through setup alternatives
Example output: ```
Local Model DynamicCache Error Fix
[Step 1] Diagnosing current environment... β Transformers version: 4.35.0 β οΈ Transformers 4.35.0 is outdated Recommended: >= 4.36.0
[Step 2] Attempting to fix... Upgrade transformers library? (y/n): y β Transformers upgraded successfully β Please restart your application
---
## Verification Steps
After applying any fix, verify it works:
### Test 1: Check Versions
```bash
python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"
Expected:
Transformers: 4.36.0 or higher
PyTorch: 2.1.0 or higher
Test 2: Quick LLM Test
python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"
Expected: Some text output (not an error message)
Test 3: Full Integration Test
Process a single transcript through the app and check:
- Quality Score > 0.00 β
- Structured data extracted β
- No DynamicCache errors in logs β
Understanding Quality Score 0.00
If you see Quality Score: 0.00 for all transcripts, it means:
Cause: LLM analysis is failing (likely due to this error)
How Quality Score is calculated (validation.py):
def validate_transcript_quality(full_text, structured_data, interviewee_type):
score = 0.0
# Text length check (0.3 points)
if len(full_text) > 100: score += 0.3
# Structured data check (0.4 points)
if has_structured_data: score += 0.4
# Specificity check (0.3 points)
if has_specific_terms: score += 0.3
return score, issues
If LLM fails:
full_text= "[Error] Local model failed: ..."structured_data= {} (empty)- Result: Score = 0.00
Fix: Resolve the DynamicCache error β LLM works β Quality Score improves to 0.7-1.0
Prevention & Best Practices
1. Pin Dependency Versions
In requirements.txt:
transformers>=4.36.0,<5.0.0
torch>=2.1.0,<2.3.0
Why: Ensures compatible versions are installed together
2. Use Virtual Environments
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
pip install -r requirements.txt
Why: Isolates dependencies, prevents conflicts with other projects
3. Regular Updates
pip install --upgrade transformers torch accelerate
When:
- After any error
- Monthly maintenance
- Before deploying to production
4. Prefer Cloud APIs for Production
For production deployments:
- Use HuggingFace API for reliability
- Use LMStudio for on-premise/offline requirements
- Avoid local transformers unless you control the environment
Environment-Specific Notes
Docker / HuggingFace Spaces
# In Dockerfile or requirements
RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate
Windows
# Install in PowerShell with admin rights
pip install --upgrade transformers torch accelerate
Linux / WSL
pip3 install --upgrade transformers torch accelerate
macOS
pip3 install --upgrade transformers torch accelerate
Still Having Issues?
Debug Mode
Enable detailed logging:
import os
os.environ["DEBUG_MODE"] = "True"
Then check logs for detailed error messages.
Check Full Error Stack
Look for the full traceback in console output:
ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
Traceback (most recent call last):
File "llm.py", line 459, in query_llm_local
outputs = query_llm_local.model.generate(...)
...
Contact Support
If the issue persists:
- Run diagnostic script:
python fix_local_model.py - Capture full logs
- Note your environment:
- OS (Windows/Linux/Mac)
- Python version
- Transformers version
- PyTorch version
- Report issue with logs
Summary Checklist
- Updated transformers:
pip install --upgrade transformers - Verified version:
python -c "import transformers; print(transformers.__version__)" - Applied code fix (use_cache=False) - already done in llm.py
- Tested with sample transcript
- Quality Score > 0.00 β
- OR: Switched to HF API / LMStudio instead
If all checked: β Problem solved!
If still failing: Use HF API or LMStudio (Solutions 2-3 above)
Related Files
llm.py- Contains the fix (lines 460-480)fix_local_model.py- Diagnostic scriptrequirements.txt- Dependency versionsENHANCEMENTS.md- Recent improvements documentation
Technical Details (For Developers)
Why use_cache=False Works
Normal generation with caching:
# Step 1: Generate token 1
cache = DynamicCache() # Create cache
cache.seen_tokens = 1 # Track position
# Step 2: Generate token 2
cache.seen_tokens = 2 # Update position
# ... uses previous key/values from cache
# Faster but requires cache.seen_tokens attribute
Generation without caching:
# Step 1: Generate token 1
# No cache used
# Step 2: Generate token 2
# Recompute everything from scratch
# Slower (~10-20%) but no cache dependencies
Future Improvements
We're monitoring:
- Transformers library updates
- Alternative caching implementations
- Model-specific optimizations
Stay updated: Check ENHANCEMENTS.md for latest improvements.