Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / TROUBLESHOOTING_DYNAMIC_CACHE.md

jmisak

Upload 5 files

93c98b5 verified 2 months ago

preview code

raw

history blame contribute delete

9.98 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Troubleshooting: DynamicCache 'seen_tokens' Error

Error Message

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'

What This Means

This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the transformers library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.

Impact:

Transcripts process but get Quality Score 0.00
LLM analysis fails for all chunks
No insights extracted from transcripts
System still generates outputs but they're empty/error messages

Root Cause

The transformers library changed its internal Cache implementation between versions:

Older versions (< 4.36): Used simpler cache without seen_tokens attribute
Newer versions (>= 4.36): Introduced DynamicCache with seen_tokens attribute
Version mismatch: Code expects one format but library provides another

The error specifically occurs during the model.generate() call when the library tries to manage the key-value cache for efficient generation.

Quick Fix (Applied)

File: llm.py (lines 460-480)

The code has been updated with:

# Fix for DynamicCache 'seen_tokens' error
outputs = query_llm_local.model.generate(
    **inputs,
    max_new_tokens=max_tokens,
    temperature=temperature,
    do_sample=temperature > 0,
    pad_token_id=query_llm_local.tokenizer.eos_token_id,
    use_cache=False  # ← Disable caching to avoid DynamicCache errors
)

What this does: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.

Trade-off: Slightly slower generation (~10-20%) but avoids the error completely.

Solutions (In Order of Preference)

Solution 1: Upgrade Transformers Library ✅ RECOMMENDED

pip install --upgrade transformers

Expected version: 4.36.0 or higher

Verify installation:

python -c "import transformers; print(transformers.__version__)"

Expected output: 4.36.0 or higher

Why this works: Newer versions have the seen_tokens attribute properly implemented.

Solution 2: Use HuggingFace API Instead 🚀 EASIEST

Instead of running models locally, use HuggingFace's cloud API.

Advantages:

No local model loading (saves RAM)
Faster processing
No compatibility issues
Access to larger, better models

Setup:

Get a HuggingFace token: https://huggingface.co/settings/tokens
Create token with "Read" access
Set environment variables:

export HUGGINGFACE_TOKEN='hf_your_token_here'
export USE_HF_API=True

Or in .env file:

HUGGINGFACE_TOKEN=hf_your_token_here
USE_HF_API=True

Verify:

python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"

Solution 3: Use LMStudio 🖥️ BEST FOR OFFLINE

LMStudio provides a GUI for running local models with better compatibility.

Advantages:

Better compatibility than raw transformers
Easy model management with GUI
Local/offline processing
No API costs

Setup:

Download LMStudio: https://lmstudio.ai/
Install and open LMStudio
Download a model (recommended: Phi-3-mini or Mistral-7B)
Start the local server:
- Open LMStudio
- Go to "Server" tab
- Click "Start Server"
- Default: http://localhost:1234
Set environment variables:

export USE_LMSTUDIO=True
export LMSTUDIO_URL=http://localhost:1234

Or in .env file:

USE_LMSTUDIO=True
LMSTUDIO_URL=http://localhost:1234

Verify:

curl http://localhost:1234/v1/models

Should return JSON with available models.

Solution 4: Use Diagnostic Script

Run the diagnostic script to automatically detect and fix issues:

python fix_local_model.py

This script will:

Check your transformers version
Test local model functionality
Provide specific recommendations
Guide you through setup alternatives

Example output: ```

Local Model DynamicCache Error Fix

[Step 1] Diagnosing current environment... ✓ Transformers version: 4.35.0 ⚠️ Transformers 4.35.0 is outdated Recommended: >= 4.36.0

[Step 2] Attempting to fix... Upgrade transformers library? (y/n): y ✓ Transformers upgraded successfully ✓ Please restart your application


---

## Verification Steps

After applying any fix, verify it works:

### Test 1: Check Versions
```bash
python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"

Expected:

Transformers: 4.36.0 or higher
PyTorch: 2.1.0 or higher

Test 2: Quick LLM Test

python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"

Expected: Some text output (not an error message)

Test 3: Full Integration Test

Process a single transcript through the app and check:

Quality Score > 0.00 ✓
Structured data extracted ✓
No DynamicCache errors in logs ✓

Understanding Quality Score 0.00

If you see Quality Score: 0.00 for all transcripts, it means:

Cause: LLM analysis is failing (likely due to this error)

How Quality Score is calculated (validation.py):

def validate_transcript_quality(full_text, structured_data, interviewee_type):
    score = 0.0

    # Text length check (0.3 points)
    if len(full_text) > 100: score += 0.3

    # Structured data check (0.4 points)
    if has_structured_data: score += 0.4

    # Specificity check (0.3 points)
    if has_specific_terms: score += 0.3

    return score, issues

If LLM fails:

full_text = "[Error] Local model failed: ..."
structured_data = {} (empty)
Result: Score = 0.00

Fix: Resolve the DynamicCache error → LLM works → Quality Score improves to 0.7-1.0

Prevention & Best Practices

1. Pin Dependency Versions

In requirements.txt:

transformers>=4.36.0,<5.0.0
torch>=2.1.0,<2.3.0

Why: Ensures compatible versions are installed together

2. Use Virtual Environments

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Why: Isolates dependencies, prevents conflicts with other projects

3. Regular Updates

pip install --upgrade transformers torch accelerate

When:

After any error
Monthly maintenance
Before deploying to production

4. Prefer Cloud APIs for Production

For production deployments:

Use HuggingFace API for reliability
Use LMStudio for on-premise/offline requirements
Avoid local transformers unless you control the environment

Environment-Specific Notes

Docker / HuggingFace Spaces

# In Dockerfile or requirements
RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate

Windows

# Install in PowerShell with admin rights
pip install --upgrade transformers torch accelerate

Linux / WSL

pip3 install --upgrade transformers torch accelerate

macOS

pip3 install --upgrade transformers torch accelerate

Still Having Issues?

Debug Mode

Enable detailed logging:

import os
os.environ["DEBUG_MODE"] = "True"

Then check logs for detailed error messages.

Check Full Error Stack

Look for the full traceback in console output:

ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
Traceback (most recent call last):
  File "llm.py", line 459, in query_llm_local
    outputs = query_llm_local.model.generate(...)
  ...

Contact Support

If the issue persists:

Run diagnostic script: python fix_local_model.py
Capture full logs
Note your environment:
- OS (Windows/Linux/Mac)
- Python version
- Transformers version
- PyTorch version
Report issue with logs

Summary Checklist

Updated transformers: pip install --upgrade transformers
Verified version: python -c "import transformers; print(transformers.__version__)"
Applied code fix (use_cache=False) - already done in llm.py
Tested with sample transcript
Quality Score > 0.00 ✓
OR: Switched to HF API / LMStudio instead

If all checked: ✓ Problem solved!

If still failing: Use HF API or LMStudio (Solutions 2-3 above)

Related Files

llm.py - Contains the fix (lines 460-480)
fix_local_model.py - Diagnostic script
requirements.txt - Dependency versions
ENHANCEMENTS.md - Recent improvements documentation

Technical Details (For Developers)

Why `use_cache=False` Works

Normal generation with caching:

# Step 1: Generate token 1
cache = DynamicCache()  # Create cache
cache.seen_tokens = 1   # Track position

# Step 2: Generate token 2
cache.seen_tokens = 2   # Update position
# ... uses previous key/values from cache

# Faster but requires cache.seen_tokens attribute

Generation without caching:

# Step 1: Generate token 1
# No cache used

# Step 2: Generate token 2
# Recompute everything from scratch

# Slower (~10-20%) but no cache dependencies

Future Improvements

We're monitoring:

Transformers library updates
Alternative caching implementations
Model-specific optimizations

Stay updated: Check ENHANCEMENTS.md for latest improvements.