Spaces:

KinetoLabs
/

SmokeScan

Paused

KinetoLabs Claude Opus 4.5 commited on 8 days ago

Commit

3c9a722

1 Parent(s): ed575b1

Force vLLM V0 engine + reduce max_model_len for stability

- VLLM_USE_V1=0: Force stable V0 engine instead of V1
- Reduce max_model_len from 32768 to 16384 for memory safety
- Keep NCCL and spawn settings for multi-GPU reliability

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (2) hide show

config/settings.py +1 -1
models/real.py +3 -0

config/settings.py CHANGED Viewed

@@ -25,7 +25,7 @@ class Settings(BaseSettings):
     # vLLM configuration
     vllm_tensor_parallel_size: int = 4  # Use all 4 L4 GPUs
-    vllm_max_model_len: int = 32768  # Context window
     # ChromaDB
     chroma_persist_dir: str = "./chroma_db"

     # vLLM configuration
     vllm_tensor_parallel_size: int = 4  # Use all 4 L4 GPUs
+    vllm_max_model_len: int = 16384  # Reduced from 32768 for memory safety
     # ChromaDB
     chroma_persist_dir: str = "./chroma_db"

models/real.py CHANGED Viewed

@@ -15,6 +15,9 @@ Model Loading:
 import os
 # vLLM environment variables - MUST be set before importing vLLM
 # Fix for "Engine core initialization failed" with tensor parallelism
 # See: https://github.com/vllm-project/vllm/issues/17618
 os.environ.setdefault("VLLM_WORKER_MULTIPROC_METHOD", "spawn")

 import os
 # vLLM environment variables - MUST be set before importing vLLM
+# Force V0 engine (more stable than V1 for multi-GPU)
+os.environ.setdefault("VLLM_USE_V1", "0")
 # Fix for "Engine core initialization failed" with tensor parallelism
 # See: https://github.com/vllm-project/vllm/issues/17618
 os.environ.setdefault("VLLM_WORKER_MULTIPROC_METHOD", "spawn")