# Troubleshooting: DynamicCache 'seen_tokens' Error ## Error Message ``` ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens' ``` ## What This Means This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation. **Impact**: - Transcripts process but get Quality Score 0.00 - LLM analysis fails for all chunks - No insights extracted from transcripts - System still generates outputs but they're empty/error messages --- ## Root Cause The `transformers` library changed its internal `Cache` implementation between versions: - **Older versions (< 4.36)**: Used simpler cache without `seen_tokens` attribute - **Newer versions (>= 4.36)**: Introduced `DynamicCache` with `seen_tokens` attribute - **Version mismatch**: Code expects one format but library provides another The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation. --- ## Quick Fix (Applied) **File**: `llm.py` (lines 460-480) The code has been updated with: ```python # Fix for DynamicCache 'seen_tokens' error outputs = query_llm_local.model.generate( **inputs, max_new_tokens=max_tokens, temperature=temperature, do_sample=temperature > 0, pad_token_id=query_llm_local.tokenizer.eos_token_id, use_cache=False # ← Disable caching to avoid DynamicCache errors ) ``` **What this does**: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step. **Trade-off**: Slightly slower generation (~10-20%) but avoids the error completely. --- ## Solutions (In Order of Preference) ### Solution 1: Upgrade Transformers Library ✅ **RECOMMENDED** ```bash pip install --upgrade transformers ``` **Expected version**: 4.36.0 or higher **Verify installation**: ```bash python -c "import transformers; print(transformers.__version__)" ``` **Expected output**: `4.36.0` or higher **Why this works**: Newer versions have the `seen_tokens` attribute properly implemented. --- ### Solution 2: Use HuggingFace API Instead 🚀 **EASIEST** Instead of running models locally, use HuggingFace's cloud API. **Advantages**: - No local model loading (saves RAM) - Faster processing - No compatibility issues - Access to larger, better models **Setup**: 1. Get a HuggingFace token: https://huggingface.co/settings/tokens 2. Create token with "Read" access 3. Set environment variables: ```bash export HUGGINGFACE_TOKEN='hf_your_token_here' export USE_HF_API=True ``` Or in `.env` file: ``` HUGGINGFACE_TOKEN=hf_your_token_here USE_HF_API=True ``` **Verify**: ```bash python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])" ``` --- ### Solution 3: Use LMStudio 🖥️ **BEST FOR OFFLINE** LMStudio provides a GUI for running local models with better compatibility. **Advantages**: - Better compatibility than raw transformers - Easy model management with GUI - Local/offline processing - No API costs **Setup**: 1. Download LMStudio: https://lmstudio.ai/ 2. Install and open LMStudio 3. Download a model (recommended: Phi-3-mini or Mistral-7B) 4. Start the local server: - Open LMStudio - Go to "Server" tab - Click "Start Server" - Default: http://localhost:1234 5. Set environment variables: ```bash export USE_LMSTUDIO=True export LMSTUDIO_URL=http://localhost:1234 ``` Or in `.env` file: ``` USE_LMSTUDIO=True LMSTUDIO_URL=http://localhost:1234 ``` **Verify**: ```bash curl http://localhost:1234/v1/models ``` Should return JSON with available models. --- ### Solution 4: Use Diagnostic Script Run the diagnostic script to automatically detect and fix issues: ```bash python fix_local_model.py ``` This script will: 1. Check your transformers version 2. Test local model functionality 3. Provide specific recommendations 4. Guide you through setup alternatives **Example output**: ``` ================================================================== Local Model DynamicCache Error Fix ================================================================== [Step 1] Diagnosing current environment... ✓ Transformers version: 4.35.0 ⚠️ Transformers 4.35.0 is outdated Recommended: >= 4.36.0 [Step 2] Attempting to fix... Upgrade transformers library? (y/n): y ✓ Transformers upgraded successfully ✓ Please restart your application ``` --- ## Verification Steps After applying any fix, verify it works: ### Test 1: Check Versions ```bash python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')" ``` **Expected**: ``` Transformers: 4.36.0 or higher PyTorch: 2.1.0 or higher ``` ### Test 2: Quick LLM Test ```bash python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))" ``` **Expected**: Some text output (not an error message) ### Test 3: Full Integration Test Process a single transcript through the app and check: - Quality Score > 0.00 ✓ - Structured data extracted ✓ - No DynamicCache errors in logs ✓ --- ## Understanding Quality Score 0.00 If you see `Quality Score: 0.00` for all transcripts, it means: **Cause**: LLM analysis is failing (likely due to this error) **How Quality Score is calculated** (validation.py): ```python def validate_transcript_quality(full_text, structured_data, interviewee_type): score = 0.0 # Text length check (0.3 points) if len(full_text) > 100: score += 0.3 # Structured data check (0.4 points) if has_structured_data: score += 0.4 # Specificity check (0.3 points) if has_specific_terms: score += 0.3 return score, issues ``` **If LLM fails**: - `full_text` = "[Error] Local model failed: ..." - `structured_data` = {} (empty) - **Result**: Score = 0.00 **Fix**: Resolve the DynamicCache error → LLM works → Quality Score improves to 0.7-1.0 --- ## Prevention & Best Practices ### 1. Pin Dependency Versions In `requirements.txt`: ``` transformers>=4.36.0,<5.0.0 torch>=2.1.0,<2.3.0 ``` **Why**: Ensures compatible versions are installed together ### 2. Use Virtual Environments ```bash python -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows pip install -r requirements.txt ``` **Why**: Isolates dependencies, prevents conflicts with other projects ### 3. Regular Updates ```bash pip install --upgrade transformers torch accelerate ``` **When**: - After any error - Monthly maintenance - Before deploying to production ### 4. Prefer Cloud APIs for Production For production deployments: - **Use HuggingFace API** for reliability - **Use LMStudio** for on-premise/offline requirements - **Avoid local transformers** unless you control the environment --- ## Environment-Specific Notes ### Docker / HuggingFace Spaces ```dockerfile # In Dockerfile or requirements RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate ``` ### Windows ```powershell # Install in PowerShell with admin rights pip install --upgrade transformers torch accelerate ``` ### Linux / WSL ```bash pip3 install --upgrade transformers torch accelerate ``` ### macOS ```bash pip3 install --upgrade transformers torch accelerate ``` --- ## Still Having Issues? ### Debug Mode Enable detailed logging: ```python import os os.environ["DEBUG_MODE"] = "True" ``` Then check logs for detailed error messages. ### Check Full Error Stack Look for the full traceback in console output: ``` ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens' Traceback (most recent call last): File "llm.py", line 459, in query_llm_local outputs = query_llm_local.model.generate(...) ... ``` ### Contact Support If the issue persists: 1. Run diagnostic script: `python fix_local_model.py` 2. Capture full logs 3. Note your environment: - OS (Windows/Linux/Mac) - Python version - Transformers version - PyTorch version 4. Report issue with logs --- ## Summary Checklist - [ ] Updated transformers: `pip install --upgrade transformers` - [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"` - [ ] Applied code fix (use_cache=False) - already done in llm.py - [ ] Tested with sample transcript - [ ] Quality Score > 0.00 ✓ - [ ] OR: Switched to HF API / LMStudio instead **If all checked**: ✓ Problem solved! **If still failing**: Use HF API or LMStudio (Solutions 2-3 above) --- ## Related Files - `llm.py` - Contains the fix (lines 460-480) - `fix_local_model.py` - Diagnostic script - `requirements.txt` - Dependency versions - `ENHANCEMENTS.md` - Recent improvements documentation --- ## Technical Details (For Developers) ### Why `use_cache=False` Works **Normal generation with caching**: ```python # Step 1: Generate token 1 cache = DynamicCache() # Create cache cache.seen_tokens = 1 # Track position # Step 2: Generate token 2 cache.seen_tokens = 2 # Update position # ... uses previous key/values from cache # Faster but requires cache.seen_tokens attribute ``` **Generation without caching**: ```python # Step 1: Generate token 1 # No cache used # Step 2: Generate token 2 # Recompute everything from scratch # Slower (~10-20%) but no cache dependencies ``` ### Future Improvements We're monitoring: - Transformers library updates - Alternative caching implementations - Model-specific optimizations Stay updated: Check `ENHANCEMENTS.md` for latest improvements.