Spaces:
Sleeping
Sleeping
| # Troubleshooting: DynamicCache 'seen_tokens' Error | |
| ## Error Message | |
| ``` | |
| ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens' | |
| ``` | |
| ## What This Means | |
| This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation. | |
| **Impact**: | |
| - Transcripts process but get Quality Score 0.00 | |
| - LLM analysis fails for all chunks | |
| - No insights extracted from transcripts | |
| - System still generates outputs but they're empty/error messages | |
| --- | |
| ## Root Cause | |
| The `transformers` library changed its internal `Cache` implementation between versions: | |
| - **Older versions (< 4.36)**: Used simpler cache without `seen_tokens` attribute | |
| - **Newer versions (>= 4.36)**: Introduced `DynamicCache` with `seen_tokens` attribute | |
| - **Version mismatch**: Code expects one format but library provides another | |
| The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation. | |
| --- | |
| ## Quick Fix (Applied) | |
| **File**: `llm.py` (lines 460-480) | |
| The code has been updated with: | |
| ```python | |
| # Fix for DynamicCache 'seen_tokens' error | |
| outputs = query_llm_local.model.generate( | |
| **inputs, | |
| max_new_tokens=max_tokens, | |
| temperature=temperature, | |
| do_sample=temperature > 0, | |
| pad_token_id=query_llm_local.tokenizer.eos_token_id, | |
| use_cache=False # β Disable caching to avoid DynamicCache errors | |
| ) | |
| ``` | |
| **What this does**: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step. | |
| **Trade-off**: Slightly slower generation (~10-20%) but avoids the error completely. | |
| --- | |
| ## Solutions (In Order of Preference) | |
| ### Solution 1: Upgrade Transformers Library β **RECOMMENDED** | |
| ```bash | |
| pip install --upgrade transformers | |
| ``` | |
| **Expected version**: 4.36.0 or higher | |
| **Verify installation**: | |
| ```bash | |
| python -c "import transformers; print(transformers.__version__)" | |
| ``` | |
| **Expected output**: `4.36.0` or higher | |
| **Why this works**: Newer versions have the `seen_tokens` attribute properly implemented. | |
| --- | |
| ### Solution 2: Use HuggingFace API Instead π **EASIEST** | |
| Instead of running models locally, use HuggingFace's cloud API. | |
| **Advantages**: | |
| - No local model loading (saves RAM) | |
| - Faster processing | |
| - No compatibility issues | |
| - Access to larger, better models | |
| **Setup**: | |
| 1. Get a HuggingFace token: https://huggingface.co/settings/tokens | |
| 2. Create token with "Read" access | |
| 3. Set environment variables: | |
| ```bash | |
| export HUGGINGFACE_TOKEN='hf_your_token_here' | |
| export USE_HF_API=True | |
| ``` | |
| Or in `.env` file: | |
| ``` | |
| HUGGINGFACE_TOKEN=hf_your_token_here | |
| USE_HF_API=True | |
| ``` | |
| **Verify**: | |
| ```bash | |
| python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])" | |
| ``` | |
| --- | |
| ### Solution 3: Use LMStudio π₯οΈ **BEST FOR OFFLINE** | |
| LMStudio provides a GUI for running local models with better compatibility. | |
| **Advantages**: | |
| - Better compatibility than raw transformers | |
| - Easy model management with GUI | |
| - Local/offline processing | |
| - No API costs | |
| **Setup**: | |
| 1. Download LMStudio: https://lmstudio.ai/ | |
| 2. Install and open LMStudio | |
| 3. Download a model (recommended: Phi-3-mini or Mistral-7B) | |
| 4. Start the local server: | |
| - Open LMStudio | |
| - Go to "Server" tab | |
| - Click "Start Server" | |
| - Default: http://localhost:1234 | |
| 5. Set environment variables: | |
| ```bash | |
| export USE_LMSTUDIO=True | |
| export LMSTUDIO_URL=http://localhost:1234 | |
| ``` | |
| Or in `.env` file: | |
| ``` | |
| USE_LMSTUDIO=True | |
| LMSTUDIO_URL=http://localhost:1234 | |
| ``` | |
| **Verify**: | |
| ```bash | |
| curl http://localhost:1234/v1/models | |
| ``` | |
| Should return JSON with available models. | |
| --- | |
| ### Solution 4: Use Diagnostic Script | |
| Run the diagnostic script to automatically detect and fix issues: | |
| ```bash | |
| python fix_local_model.py | |
| ``` | |
| This script will: | |
| 1. Check your transformers version | |
| 2. Test local model functionality | |
| 3. Provide specific recommendations | |
| 4. Guide you through setup alternatives | |
| **Example output**: | |
| ``` | |
| ================================================================== | |
| Local Model DynamicCache Error Fix | |
| ================================================================== | |
| [Step 1] Diagnosing current environment... | |
| β Transformers version: 4.35.0 | |
| β οΈ Transformers 4.35.0 is outdated | |
| Recommended: >= 4.36.0 | |
| [Step 2] Attempting to fix... | |
| Upgrade transformers library? (y/n): y | |
| β Transformers upgraded successfully | |
| β Please restart your application | |
| ``` | |
| --- | |
| ## Verification Steps | |
| After applying any fix, verify it works: | |
| ### Test 1: Check Versions | |
| ```bash | |
| python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')" | |
| ``` | |
| **Expected**: | |
| ``` | |
| Transformers: 4.36.0 or higher | |
| PyTorch: 2.1.0 or higher | |
| ``` | |
| ### Test 2: Quick LLM Test | |
| ```bash | |
| python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))" | |
| ``` | |
| **Expected**: Some text output (not an error message) | |
| ### Test 3: Full Integration Test | |
| Process a single transcript through the app and check: | |
| - Quality Score > 0.00 β | |
| - Structured data extracted β | |
| - No DynamicCache errors in logs β | |
| --- | |
| ## Understanding Quality Score 0.00 | |
| If you see `Quality Score: 0.00` for all transcripts, it means: | |
| **Cause**: LLM analysis is failing (likely due to this error) | |
| **How Quality Score is calculated** (validation.py): | |
| ```python | |
| def validate_transcript_quality(full_text, structured_data, interviewee_type): | |
| score = 0.0 | |
| # Text length check (0.3 points) | |
| if len(full_text) > 100: score += 0.3 | |
| # Structured data check (0.4 points) | |
| if has_structured_data: score += 0.4 | |
| # Specificity check (0.3 points) | |
| if has_specific_terms: score += 0.3 | |
| return score, issues | |
| ``` | |
| **If LLM fails**: | |
| - `full_text` = "[Error] Local model failed: ..." | |
| - `structured_data` = {} (empty) | |
| - **Result**: Score = 0.00 | |
| **Fix**: Resolve the DynamicCache error β LLM works β Quality Score improves to 0.7-1.0 | |
| --- | |
| ## Prevention & Best Practices | |
| ### 1. Pin Dependency Versions | |
| In `requirements.txt`: | |
| ``` | |
| transformers>=4.36.0,<5.0.0 | |
| torch>=2.1.0,<2.3.0 | |
| ``` | |
| **Why**: Ensures compatible versions are installed together | |
| ### 2. Use Virtual Environments | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # Linux/Mac | |
| # or | |
| venv\Scripts\activate # Windows | |
| pip install -r requirements.txt | |
| ``` | |
| **Why**: Isolates dependencies, prevents conflicts with other projects | |
| ### 3. Regular Updates | |
| ```bash | |
| pip install --upgrade transformers torch accelerate | |
| ``` | |
| **When**: | |
| - After any error | |
| - Monthly maintenance | |
| - Before deploying to production | |
| ### 4. Prefer Cloud APIs for Production | |
| For production deployments: | |
| - **Use HuggingFace API** for reliability | |
| - **Use LMStudio** for on-premise/offline requirements | |
| - **Avoid local transformers** unless you control the environment | |
| --- | |
| ## Environment-Specific Notes | |
| ### Docker / HuggingFace Spaces | |
| ```dockerfile | |
| # In Dockerfile or requirements | |
| RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate | |
| ``` | |
| ### Windows | |
| ```powershell | |
| # Install in PowerShell with admin rights | |
| pip install --upgrade transformers torch accelerate | |
| ``` | |
| ### Linux / WSL | |
| ```bash | |
| pip3 install --upgrade transformers torch accelerate | |
| ``` | |
| ### macOS | |
| ```bash | |
| pip3 install --upgrade transformers torch accelerate | |
| ``` | |
| --- | |
| ## Still Having Issues? | |
| ### Debug Mode | |
| Enable detailed logging: | |
| ```python | |
| import os | |
| os.environ["DEBUG_MODE"] = "True" | |
| ``` | |
| Then check logs for detailed error messages. | |
| ### Check Full Error Stack | |
| Look for the full traceback in console output: | |
| ``` | |
| ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens' | |
| Traceback (most recent call last): | |
| File "llm.py", line 459, in query_llm_local | |
| outputs = query_llm_local.model.generate(...) | |
| ... | |
| ``` | |
| ### Contact Support | |
| If the issue persists: | |
| 1. Run diagnostic script: `python fix_local_model.py` | |
| 2. Capture full logs | |
| 3. Note your environment: | |
| - OS (Windows/Linux/Mac) | |
| - Python version | |
| - Transformers version | |
| - PyTorch version | |
| 4. Report issue with logs | |
| --- | |
| ## Summary Checklist | |
| - [ ] Updated transformers: `pip install --upgrade transformers` | |
| - [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"` | |
| - [ ] Applied code fix (use_cache=False) - already done in llm.py | |
| - [ ] Tested with sample transcript | |
| - [ ] Quality Score > 0.00 β | |
| - [ ] OR: Switched to HF API / LMStudio instead | |
| **If all checked**: β Problem solved! | |
| **If still failing**: Use HF API or LMStudio (Solutions 2-3 above) | |
| --- | |
| ## Related Files | |
| - `llm.py` - Contains the fix (lines 460-480) | |
| - `fix_local_model.py` - Diagnostic script | |
| - `requirements.txt` - Dependency versions | |
| - `ENHANCEMENTS.md` - Recent improvements documentation | |
| --- | |
| ## Technical Details (For Developers) | |
| ### Why `use_cache=False` Works | |
| **Normal generation with caching**: | |
| ```python | |
| # Step 1: Generate token 1 | |
| cache = DynamicCache() # Create cache | |
| cache.seen_tokens = 1 # Track position | |
| # Step 2: Generate token 2 | |
| cache.seen_tokens = 2 # Update position | |
| # ... uses previous key/values from cache | |
| # Faster but requires cache.seen_tokens attribute | |
| ``` | |
| **Generation without caching**: | |
| ```python | |
| # Step 1: Generate token 1 | |
| # No cache used | |
| # Step 2: Generate token 2 | |
| # Recompute everything from scratch | |
| # Slower (~10-20%) but no cache dependencies | |
| ``` | |
| ### Future Improvements | |
| We're monitoring: | |
| - Transformers library updates | |
| - Alternative caching implementations | |
| - Model-specific optimizations | |
| Stay updated: Check `ENHANCEMENTS.md` for latest improvements. | |