Spaces:
Paused
Paused
Commit
·
a65b765
1
Parent(s):
71a896b
Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU
Browse filesRoot cause: vLLM 0.13.x deprecated V0 engine, VLLM_USE_V1=0 not honored.
Logs showed V1 engine initializing despite env var, causing 4-GPU sync hang.
Test: Deploy and verify logs show "V0 LLM engine" instead of "V1".
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- requirements.txt +2 -1
requirements.txt
CHANGED
|
@@ -6,7 +6,8 @@ qwen-vl-utils>=0.0.14
|
|
| 6 |
torchvision
|
| 7 |
|
| 8 |
# vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
|
| 9 |
-
|
|
|
|
| 10 |
|
| 11 |
# UI
|
| 12 |
gradio>=6.0.0,<7.0.0
|
|
|
|
| 6 |
torchvision
|
| 7 |
|
| 8 |
# vLLM for FP8 quantized model inference (>=0.11.0 required for Qwen3-VL support)
|
| 9 |
+
# Pinned <0.13.0: V0 engine deprecated in 0.13.x, VLLM_USE_V1=0 not honored
|
| 10 |
+
vllm>=0.11.0,<0.13.0
|
| 11 |
|
| 12 |
# UI
|
| 13 |
gradio>=6.0.0,<7.0.0
|