TranscriptWriting / TROUBLESHOOTING_DYNAMIC_CACHE.md
jmisak's picture
Upload 5 files
93c98b5 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Troubleshooting: DynamicCache 'seen_tokens' Error

Error Message

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'

What This Means

This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the transformers library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.

Impact:

  • Transcripts process but get Quality Score 0.00
  • LLM analysis fails for all chunks
  • No insights extracted from transcripts
  • System still generates outputs but they're empty/error messages

Root Cause

The transformers library changed its internal Cache implementation between versions:

  • Older versions (< 4.36): Used simpler cache without seen_tokens attribute
  • Newer versions (>= 4.36): Introduced DynamicCache with seen_tokens attribute
  • Version mismatch: Code expects one format but library provides another

The error specifically occurs during the model.generate() call when the library tries to manage the key-value cache for efficient generation.


Quick Fix (Applied)

File: llm.py (lines 460-480)

The code has been updated with:

# Fix for DynamicCache 'seen_tokens' error
outputs = query_llm_local.model.generate(
    **inputs,
    max_new_tokens=max_tokens,
    temperature=temperature,
    do_sample=temperature > 0,
    pad_token_id=query_llm_local.tokenizer.eos_token_id,
    use_cache=False  # ← Disable caching to avoid DynamicCache errors
)

What this does: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.

Trade-off: Slightly slower generation (~10-20%) but avoids the error completely.


Solutions (In Order of Preference)

Solution 1: Upgrade Transformers Library βœ… RECOMMENDED

pip install --upgrade transformers

Expected version: 4.36.0 or higher

Verify installation:

python -c "import transformers; print(transformers.__version__)"

Expected output: 4.36.0 or higher

Why this works: Newer versions have the seen_tokens attribute properly implemented.


Solution 2: Use HuggingFace API Instead πŸš€ EASIEST

Instead of running models locally, use HuggingFace's cloud API.

Advantages:

  • No local model loading (saves RAM)
  • Faster processing
  • No compatibility issues
  • Access to larger, better models

Setup:

  1. Get a HuggingFace token: https://huggingface.co/settings/tokens
  2. Create token with "Read" access
  3. Set environment variables:
export HUGGINGFACE_TOKEN='hf_your_token_here'
export USE_HF_API=True

Or in .env file:

HUGGINGFACE_TOKEN=hf_your_token_here
USE_HF_API=True

Verify:

python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"

Solution 3: Use LMStudio πŸ–₯️ BEST FOR OFFLINE

LMStudio provides a GUI for running local models with better compatibility.

Advantages:

  • Better compatibility than raw transformers
  • Easy model management with GUI
  • Local/offline processing
  • No API costs

Setup:

  1. Download LMStudio: https://lmstudio.ai/

  2. Install and open LMStudio

  3. Download a model (recommended: Phi-3-mini or Mistral-7B)

  4. Start the local server:

  5. Set environment variables:

export USE_LMSTUDIO=True
export LMSTUDIO_URL=http://localhost:1234

Or in .env file:

USE_LMSTUDIO=True
LMSTUDIO_URL=http://localhost:1234

Verify:

curl http://localhost:1234/v1/models

Should return JSON with available models.


Solution 4: Use Diagnostic Script

Run the diagnostic script to automatically detect and fix issues:

python fix_local_model.py

This script will:

  1. Check your transformers version
  2. Test local model functionality
  3. Provide specific recommendations
  4. Guide you through setup alternatives

Example output: ```

Local Model DynamicCache Error Fix

[Step 1] Diagnosing current environment... βœ“ Transformers version: 4.35.0 ⚠️ Transformers 4.35.0 is outdated Recommended: >= 4.36.0

[Step 2] Attempting to fix... Upgrade transformers library? (y/n): y βœ“ Transformers upgraded successfully βœ“ Please restart your application


---

## Verification Steps

After applying any fix, verify it works:

### Test 1: Check Versions
```bash
python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"

Expected:

Transformers: 4.36.0 or higher
PyTorch: 2.1.0 or higher

Test 2: Quick LLM Test

python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"

Expected: Some text output (not an error message)

Test 3: Full Integration Test

Process a single transcript through the app and check:

  • Quality Score > 0.00 βœ“
  • Structured data extracted βœ“
  • No DynamicCache errors in logs βœ“

Understanding Quality Score 0.00

If you see Quality Score: 0.00 for all transcripts, it means:

Cause: LLM analysis is failing (likely due to this error)

How Quality Score is calculated (validation.py):

def validate_transcript_quality(full_text, structured_data, interviewee_type):
    score = 0.0

    # Text length check (0.3 points)
    if len(full_text) > 100: score += 0.3

    # Structured data check (0.4 points)
    if has_structured_data: score += 0.4

    # Specificity check (0.3 points)
    if has_specific_terms: score += 0.3

    return score, issues

If LLM fails:

  • full_text = "[Error] Local model failed: ..."
  • structured_data = {} (empty)
  • Result: Score = 0.00

Fix: Resolve the DynamicCache error β†’ LLM works β†’ Quality Score improves to 0.7-1.0


Prevention & Best Practices

1. Pin Dependency Versions

In requirements.txt:

transformers>=4.36.0,<5.0.0
torch>=2.1.0,<2.3.0

Why: Ensures compatible versions are installed together

2. Use Virtual Environments

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Why: Isolates dependencies, prevents conflicts with other projects

3. Regular Updates

pip install --upgrade transformers torch accelerate

When:

  • After any error
  • Monthly maintenance
  • Before deploying to production

4. Prefer Cloud APIs for Production

For production deployments:

  • Use HuggingFace API for reliability
  • Use LMStudio for on-premise/offline requirements
  • Avoid local transformers unless you control the environment

Environment-Specific Notes

Docker / HuggingFace Spaces

# In Dockerfile or requirements
RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate

Windows

# Install in PowerShell with admin rights
pip install --upgrade transformers torch accelerate

Linux / WSL

pip3 install --upgrade transformers torch accelerate

macOS

pip3 install --upgrade transformers torch accelerate

Still Having Issues?

Debug Mode

Enable detailed logging:

import os
os.environ["DEBUG_MODE"] = "True"

Then check logs for detailed error messages.

Check Full Error Stack

Look for the full traceback in console output:

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
Traceback (most recent call last):
  File "llm.py", line 459, in query_llm_local
    outputs = query_llm_local.model.generate(...)
  ...

Contact Support

If the issue persists:

  1. Run diagnostic script: python fix_local_model.py
  2. Capture full logs
  3. Note your environment:
    • OS (Windows/Linux/Mac)
    • Python version
    • Transformers version
    • PyTorch version
  4. Report issue with logs

Summary Checklist

  • Updated transformers: pip install --upgrade transformers
  • Verified version: python -c "import transformers; print(transformers.__version__)"
  • Applied code fix (use_cache=False) - already done in llm.py
  • Tested with sample transcript
  • Quality Score > 0.00 βœ“
  • OR: Switched to HF API / LMStudio instead

If all checked: βœ“ Problem solved!

If still failing: Use HF API or LMStudio (Solutions 2-3 above)


Related Files

  • llm.py - Contains the fix (lines 460-480)
  • fix_local_model.py - Diagnostic script
  • requirements.txt - Dependency versions
  • ENHANCEMENTS.md - Recent improvements documentation

Technical Details (For Developers)

Why use_cache=False Works

Normal generation with caching:

# Step 1: Generate token 1
cache = DynamicCache()  # Create cache
cache.seen_tokens = 1   # Track position

# Step 2: Generate token 2
cache.seen_tokens = 2   # Update position
# ... uses previous key/values from cache

# Faster but requires cache.seen_tokens attribute

Generation without caching:

# Step 1: Generate token 1
# No cache used

# Step 2: Generate token 2
# Recompute everything from scratch

# Slower (~10-20%) but no cache dependencies

Future Improvements

We're monitoring:

  • Transformers library updates
  • Alternative caching implementations
  • Model-specific optimizations

Stay updated: Check ENHANCEMENTS.md for latest improvements.