Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / TROUBLESHOOTING_DYNAMIC_CACHE.md

jmisak

Upload 5 files

93c98b5 verified 2 months ago

preview code

raw

history blame contribute delete

9.98 kB

	# Troubleshooting: DynamicCache 'seen_tokens' Error

	## Error Message
	```
	ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
	```

	## What This Means

	This error occurs when using local model inference (Phi-3, Llama, Mistral, etc.) with the `transformers` library. It's caused by a version incompatibility in the internal caching mechanism used during text generation.

	Impact:
	- Transcripts process but get Quality Score 0.00
	- LLM analysis fails for all chunks
	- No insights extracted from transcripts
	- System still generates outputs but they're empty/error messages

	---

	## Root Cause

	The `transformers` library changed its internal `Cache` implementation between versions:
	- Older versions (< 4.36): Used simpler cache without `seen_tokens` attribute
	- Newer versions (>= 4.36): Introduced `DynamicCache` with `seen_tokens` attribute
	- Version mismatch: Code expects one format but library provides another

	The error specifically occurs during the `model.generate()` call when the library tries to manage the key-value cache for efficient generation.

	---

	## Quick Fix (Applied)

	File: `llm.py` (lines 460-480)

	The code has been updated with:

	```python
	# Fix for DynamicCache 'seen_tokens' error
	outputs = query_llm_local.model.generate(
	**inputs,
	max_new_tokens=max_tokens,
	temperature=temperature,
	do_sample=temperature > 0,
	pad_token_id=query_llm_local.tokenizer.eos_token_id,
	use_cache=False # ← Disable caching to avoid DynamicCache errors
	)
	```

	What this does: Disables the key-value caching mechanism entirely, forcing the model to recompute at each step.

	Trade-off: Slightly slower generation (~10-20%) but avoids the error completely.

	---

	## Solutions (In Order of Preference)

	### Solution 1: Upgrade Transformers Library ✅ RECOMMENDED

	```bash
	pip install --upgrade transformers
	```

	Expected version: 4.36.0 or higher

	Verify installation:
	```bash
	python -c "import transformers; print(transformers.__version__)"
	```

	Expected output: `4.36.0` or higher

	Why this works: Newer versions have the `seen_tokens` attribute properly implemented.

	---

	### Solution 2: Use HuggingFace API Instead 🚀 EASIEST

	Instead of running models locally, use HuggingFace's cloud API.

	Advantages:
	- No local model loading (saves RAM)
	- Faster processing
	- No compatibility issues
	- Access to larger, better models

	Setup:

	1. Get a HuggingFace token: https://huggingface.co/settings/tokens
	2. Create token with "Read" access
	3. Set environment variables:

	```bash
	export HUGGINGFACE_TOKEN='hf_your_token_here'
	export USE_HF_API=True
	```

	Or in `.env` file:
	```
	HUGGINGFACE_TOKEN=hf_your_token_here
	USE_HF_API=True
	```

	Verify:
	```bash
	python -c "import os; print('HF Token:', os.getenv('HUGGINGFACE_TOKEN')[:20])"
	```

	---

	### Solution 3: Use LMStudio 🖥️ BEST FOR OFFLINE

	LMStudio provides a GUI for running local models with better compatibility.

	Advantages:
	- Better compatibility than raw transformers
	- Easy model management with GUI
	- Local/offline processing
	- No API costs

	Setup:

	1. Download LMStudio: https://lmstudio.ai/
	2. Install and open LMStudio
	3. Download a model (recommended: Phi-3-mini or Mistral-7B)
	4. Start the local server:
	- Open LMStudio
	- Go to "Server" tab
	- Click "Start Server"
	- Default: http://localhost:1234

	5. Set environment variables:

	```bash
	export USE_LMSTUDIO=True
	export LMSTUDIO_URL=http://localhost:1234
	```

	Or in `.env` file:
	```
	USE_LMSTUDIO=True
	LMSTUDIO_URL=http://localhost:1234
	```

	Verify:
	```bash
	curl http://localhost:1234/v1/models
	```

	Should return JSON with available models.

	---

	### Solution 4: Use Diagnostic Script

	Run the diagnostic script to automatically detect and fix issues:

	```bash
	python fix_local_model.py
	```

	This script will:
	1. Check your transformers version
	2. Test local model functionality
	3. Provide specific recommendations
	4. Guide you through setup alternatives

	Example output:
	```
	==================================================================
	Local Model DynamicCache Error Fix
	==================================================================

	[Step 1] Diagnosing current environment...
	✓ Transformers version: 4.35.0
	⚠️ Transformers 4.35.0 is outdated
	Recommended: >= 4.36.0

	[Step 2] Attempting to fix...
	Upgrade transformers library? (y/n): y
	✓ Transformers upgraded successfully
	✓ Please restart your application
	```

	---

	## Verification Steps

	After applying any fix, verify it works:

	### Test 1: Check Versions
	```bash
	python -c "import transformers, torch; print(f'Transformers: {transformers.__version__}'); print(f'PyTorch: {torch.__version__}')"
	```

	Expected:
	```
	Transformers: 4.36.0 or higher
	PyTorch: 2.1.0 or higher
	```

	### Test 2: Quick LLM Test
	```bash
	python -c "from llm import query_llm_local; print(query_llm_local('Test', max_tokens=10))"
	```

	Expected: Some text output (not an error message)

	### Test 3: Full Integration Test
	Process a single transcript through the app and check:
	- Quality Score > 0.00 ✓
	- Structured data extracted ✓
	- No DynamicCache errors in logs ✓

	---

	## Understanding Quality Score 0.00

	If you see `Quality Score: 0.00` for all transcripts, it means:

	Cause: LLM analysis is failing (likely due to this error)

	How Quality Score is calculated (validation.py):
	```python
	def validate_transcript_quality(full_text, structured_data, interviewee_type):
	score = 0.0

	# Text length check (0.3 points)
	if len(full_text) > 100: score += 0.3

	# Structured data check (0.4 points)
	if has_structured_data: score += 0.4

	# Specificity check (0.3 points)
	if has_specific_terms: score += 0.3

	return score, issues
	```

	If LLM fails:
	- `full_text` = "[Error] Local model failed: ..."
	- `structured_data` = {} (empty)
	- Result: Score = 0.00

	Fix: Resolve the DynamicCache error → LLM works → Quality Score improves to 0.7-1.0

	---

	## Prevention & Best Practices

	### 1. Pin Dependency Versions
	In `requirements.txt`:
	```
	transformers>=4.36.0,<5.0.0
	torch>=2.1.0,<2.3.0
	```

	Why: Ensures compatible versions are installed together

	### 2. Use Virtual Environments
	```bash
	python -m venv venv
	source venv/bin/activate # Linux/Mac
	# or
	venv\Scripts\activate # Windows
	pip install -r requirements.txt
	```

	Why: Isolates dependencies, prevents conflicts with other projects

	### 3. Regular Updates
	```bash
	pip install --upgrade transformers torch accelerate
	```

	When:
	- After any error
	- Monthly maintenance
	- Before deploying to production

	### 4. Prefer Cloud APIs for Production

	For production deployments:
	- Use HuggingFace API for reliability
	- Use LMStudio for on-premise/offline requirements
	- Avoid local transformers unless you control the environment

	---

	## Environment-Specific Notes

	### Docker / HuggingFace Spaces
	```dockerfile
	# In Dockerfile or requirements
	RUN pip install transformers>=4.36.0 torch>=2.1.0 accelerate
	```

	### Windows
	```powershell
	# Install in PowerShell with admin rights
	pip install --upgrade transformers torch accelerate
	```

	### Linux / WSL
	```bash
	pip3 install --upgrade transformers torch accelerate
	```

	### macOS
	```bash
	pip3 install --upgrade transformers torch accelerate
	```

	---

	## Still Having Issues?

	### Debug Mode
	Enable detailed logging:
	```python
	import os
	os.environ["DEBUG_MODE"] = "True"
	```

	Then check logs for detailed error messages.

	### Check Full Error Stack
	Look for the full traceback in console output:
	```
	ERROR: Local model error: 'DynamicCache' object has no attribute 'seen_tokens'
	Traceback (most recent call last):
	File "llm.py", line 459, in query_llm_local
	outputs = query_llm_local.model.generate(...)
	...
	```

	### Contact Support
	If the issue persists:
	1. Run diagnostic script: `python fix_local_model.py`
	2. Capture full logs
	3. Note your environment:
	- OS (Windows/Linux/Mac)
	- Python version
	- Transformers version
	- PyTorch version
	4. Report issue with logs

	---

	## Summary Checklist

	- [ ] Updated transformers: `pip install --upgrade transformers`
	- [ ] Verified version: `python -c "import transformers; print(transformers.__version__)"`
	- [ ] Applied code fix (use_cache=False) - already done in llm.py
	- [ ] Tested with sample transcript
	- [ ] Quality Score > 0.00 ✓
	- [ ] OR: Switched to HF API / LMStudio instead

	If all checked: ✓ Problem solved!

	If still failing: Use HF API or LMStudio (Solutions 2-3 above)

	---

	## Related Files

	- `llm.py` - Contains the fix (lines 460-480)
	- `fix_local_model.py` - Diagnostic script
	- `requirements.txt` - Dependency versions
	- `ENHANCEMENTS.md` - Recent improvements documentation

	---

	## Technical Details (For Developers)

	### Why `use_cache=False` Works

	Normal generation with caching:
	```python
	# Step 1: Generate token 1
	cache = DynamicCache() # Create cache
	cache.seen_tokens = 1 # Track position

	# Step 2: Generate token 2
	cache.seen_tokens = 2 # Update position
	# ... uses previous key/values from cache

	# Faster but requires cache.seen_tokens attribute
	```

	Generation without caching:
	```python
	# Step 1: Generate token 1
	# No cache used

	# Step 2: Generate token 2
	# Recompute everything from scratch

	# Slower (~10-20%) but no cache dependencies
	```

	### Future Improvements

	We're monitoring:
	- Transformers library updates
	- Alternative caching implementations
	- Model-specific optimizations

	Stay updated: Check `ENHANCEMENTS.md` for latest improvements.