Spaces:

KinetoLabs
/

SmokeScan

Paused

KinetoLabs Claude Opus 4.5 commited on 3 days ago

Commit

1b7fbd7

1 Parent(s): b85b1e0

Add enforce_eager=True to fix KV cache memory issue

torch.compile was consuming ~3-4GB overhead, causing negative
KV cache memory (-1.13 GiB). enforce_eager skips compilation,
reducing memory overhead and enabling successful model startup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show

models/real.py +1 -0

models/real.py CHANGED Viewed

@@ -84,6 +84,7 @@ class RealModelStack:
             trust_remote_code=True,
             gpu_memory_utilization=0.55,  # Leave ~10GB for embedding + reranker
             max_model_len=8192,  # Reduced to save KV cache memory
         )
         # Load processor for chat template formatting

             trust_remote_code=True,
             gpu_memory_utilization=0.55,  # Leave ~10GB for embedding + reranker
             max_model_len=8192,  # Reduced to save KV cache memory
+            enforce_eager=True,  # Skip torch.compile to reduce memory overhead
         )
         # Load processor for chat template formatting