KinetoLabs Claude Opus 4.5 commited on
Commit
a65b765
·
1 Parent(s): 71a896b

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU

Browse files

Root cause: vLLM 0.13.x deprecated V0 engine, VLLM_USE_V1=0 not honored.
Logs showed V1 engine initializing despite env var, causing 4-GPU sync hang.

Test: Deploy and verify logs show "V0 LLM engine" instead of "V1".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show
  1. requirements.txt +2 -1
requirements.txt CHANGED
@@ -6,7 +6,8 @@ qwen-vl-utils>=0.0.14
6
  torchvision
7
 
8
  # vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
9
- vllm>=0.11.0
 
10
 
11
  # UI
12
  gradio>=6.0.0,<7.0.0
 
6
  torchvision
7
 
8
  # vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
9
+ # Pinned <0.13.0: V0 engine deprecated in 0.13.x, VLLM_USE_V1=0 not honored
10
+ vllm>=0.11.0,<0.13.0
11
 
12
  # UI
13
  gradio>=6.0.0,<7.0.0