Spaces:

KinetoLabs
/

SmokeScan

Paused

KinetoLabs Claude Opus 4.5 commited on Jan 11

Commit

a65b765

1 Parent(s): 71a896b

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU

Root cause: vLLM 0.13.x deprecated V0 engine, VLLM_USE_V1=0 not honored.
Logs showed V1 engine initializing despite env var, causing 4-GPU sync hang.

Test: Deploy and verify logs show "V0 LLM engine" instead of "V1".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show

requirements.txt +2 -1

requirements.txt CHANGED Viewed

@@ -6,7 +6,8 @@ qwen-vl-utils>=0.0.14
 torchvision
 # vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
-vllm>=0.11.0
 # UI
 gradio>=6.0.0,<7.0.0

 torchvision
 # vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
+# Pinned <0.13.0: V0 engine deprecated in 0.13.x, VLLM_USE_V1=0 not honored
+vllm>=0.11.0,<0.13.0
 # UI
 gradio>=6.0.0,<7.0.0