Fails on a single DGX spark with errors below

#2
by Adrian1234 - opened

My Dockerfile contains

docker run
--privileged --gpus all -it --rm --network host --ipc=host --oom-score-adj 500
-v ~/.cache/huggingface:/root/.cache/huggingface
vllm-node
bash -c -i "vllm serve
GadflyII/GLM-4.6V-NVFP4
--tensor-parallel-size 1
--port 8000 --host 0.0.0.0
--gpu-memory-utilization 0.80"

BTW, on a single Node memory spikes full on initialization but the recedes. I increased my swap size to compensate.

I am using the 'standard' vLLM.

(EngineCore_DP0 pid=271) File "/usr/local/lib/python3.12/dist-packages/torch/cuda/init.py", line 1108, in synchronize
(EngineCore_DP0 pid=271) return torch._C._cuda_synchronize()
(EngineCore_DP0 pid=271) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=271) torch.AcceleratorError: CUDA error: an illegal instruction was encountered
(EngineCore_DP0 pid=271) Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. (EngineCore_DP0 pid=271) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (EngineCore_DP0 pid=271) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (EngineCore_DP0 pid=271) Compile with TORCH_USE_CUDA_DSAto enable device-side assertions. (EngineCore_DP0 pid=271) [rank0]:[W227 16:57:21.814142287 CUDAGuardImpl.h:122] Warning: CUDA warning: an illegal instruction was encountered (function destroyEvent) terminate called after throwing an instance of 'c10::AcceleratorError' what(): CUDA error: an illegal instruction was encountered Search forcudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from createEvent at /opt/pytorch/pytorch/aten/src/ATen/cuda/CUDAEvent.h:232 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xf0a96d1c3a04 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: + 0x43e698 (0xf0a96d2be698 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x1bc (0xf0a96d2be90c in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x108a7e4 (0xf0a96dfaa7e4 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x474654 (0xf0a96d1a4654 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x14 (0xf0a96d162244 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: + 0x5f42ac (0xf0a995cf42ac in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: + 0xb8f14c (0xf0a99628f14c in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #8: VLLM::EngineCore() [0x523cb4]
frame #9: + 0x11eb48 (0xf0a8fabceb48 in /usr/local/lib/python3.12/dist-packages/numpy/_core/_multiarray_umath.cpython-312-aarch64-linux-gnu.so)
frame #10: VLLM::EngineCore() [0x4f9dcc]
frame #11: VLLM::EngineCore() [0x523b30]
frame #12: VLLM::EngineCore() [0x4f5b48]
frame #13: VLLM::EngineCore() [0x4f58c0]
frame #14: VLLM::EngineCore() [0x523d60]
frame #15: VLLM::EngineCore() [0x4f5b48]
frame #16: VLLM::EngineCore() [0x4f58c0]
frame #17: VLLM::EngineCore() [0x523d60]
frame #18: VLLM::EngineCore() [0x4f9d9c]
frame #19: VLLM::EngineCore() [0x523b30]
frame #20: VLLM::EngineCore() [0x4f5a48]
frame #21: VLLM::EngineCore() [0x523d60]
frame #22: VLLM::EngineCore() [0x4f9db4]
frame #23: VLLM::EngineCore() [0x523b30]
frame #24: VLLM::EngineCore() [0x5153d8]
frame #25: VLLM::EngineCore() [0x4d593c]
frame #26: VLLM::EngineCore() [0x5a645c]
frame #27: VLLM::EngineCore() [0x5a6464]
frame #28: VLLM::EngineCore() [0x5a6464]
frame #29: VLLM::EngineCore() [0x5a6464]
frame #30: VLLM::EngineCore() [0x5a6464]
frame #31: VLLM::EngineCore() [0x4cea98]
frame #32: VLLM::EngineCore() [0x523cb4]
frame #33: _PyEval_EvalFrameDefault + 0x4654 (0x568c88 in VLLM::EngineCore)
frame #34: PyEval_EvalCode + 0x130 (0x563224 in VLLM::EngineCore)
frame #35: PyRun_StringFlags + 0xe0 (0x59bfb0 in VLLM::EngineCore)
frame #36: PyRun_SimpleStringFlags + 0x44 (0x67f0d4 in VLLM::EngineCore)
frame #37: Py_RunMain + 0x390 (0x68b890 in VLLM::EngineCore)
frame #38: Py_BytesMain + 0x28 (0x68b398 in VLLM::EngineCore)
frame #39: + 0x284c4 (0xf0a997d884c4 in /usr/lib/aarch64-linux-gnu/libc.so.6)
frame #40: __libc_start_main + 0x98 (0xf0a997d88598 in /usr/lib/aarch64-linux-gnu/libc.so.6)
frame #41: _start + 0x30 (0x5f6bb0 in VLLM::EngineCore)

(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 152, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 842, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 496, in init
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Not sure, I am unable to replicate this.

Sign up or log in to comment