Vllm launch stay stuck
Hi everyone,
I’m running into an issue where my model initialization appears to get stuck after these logs, and I’m not sure whether it’s actually hanging, compiling something in the background, or failing silently.
Here’s the exact output where it stops progressing:
(EngineCore_DP0 pid=200) INFO 04-27 15:14:55 [cuda.py:405] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION'].
(EngineCore_DP0 pid=200) INFO 04-27 15:14:55 [flash_attn.py:587] Using FlashAttention version 2
(EngineCore_DP0 pid=200) :1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(EngineCore_DP0 pid=200) :1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
After this point, nothing else happens no additional logs, no errors, and the process just stays there.