KinetoLabs Claude Opus 4.5 commited on
Commit
1b7fbd7
·
1 Parent(s): b85b1e0

Add enforce_eager=True to fix KV cache memory issue

Browse files

torch.compile was consuming ~3-4GB overhead, causing negative
KV cache memory (-1.13 GiB). enforce_eager skips compilation,
reducing memory overhead and enabling successful model startup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show
  1. models/real.py +1 -0
models/real.py CHANGED
@@ -84,6 +84,7 @@ class RealModelStack:
84
  trust_remote_code=True,
85
  gpu_memory_utilization=0.55, # Leave ~10GB for embedding + reranker
86
  max_model_len=8192, # Reduced to save KV cache memory
 
87
  )
88
 
89
  # Load processor for chat template formatting
 
84
  trust_remote_code=True,
85
  gpu_memory_utilization=0.55, # Leave ~10GB for embedding + reranker
86
  max_model_len=8192, # Reduced to save KV cache memory
87
+ enforce_eager=True, # Skip torch.compile to reduce memory overhead
88
  )
89
 
90
  # Load processor for chat template formatting