[Issue] Repetitive text with endless loop
While testing the model on multilingual document text images, It starts generating repetitive text with endless loop for the unknown language.
Is their a workaround to this problem such that the model could be instructed to extract only specific <language, for example: 'en'> text and discard all other information rather than unnecessarily trying to understand it?
- Tried with changing the prompt, didn't worked out!
- Applied a patch to detect repetitive tokens based on a certain threshold, Helped but it also stops model inference to provide complete OCR of the image.
I have the same problem while testing these samples with Ollama
Specifically: form_01.png, table_10.png and tablet_12.png cause it to loop forever with ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."
Have you tried adding <|grounding|>? It fixed the infinite looping for me
Let me try out <|grounding|> and update the results here.
The DeepSeek OCR model often returns the content of this tag <|det|>, but these pages actually contain text that is not returned. The performance is far from what I expected. What could be wrong? I am using the API version from https://cloud.siliconflow.cn/.
Page 32: ██████████▏ | 31/258 [03:12<17:41, 4.67s/it]
<|ref|>text<|/ref|><|det|>[[57, 110, 491, 151]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 154, 491, 195]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 198, 491, 239]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 243, 491, 283]]<|/det|>
<|ref|>title<|/ref|><|det|>[[58, 262, 118, 280]]<|/det|>
<|ref|>text<|/ref|><|det|>[[81, 284, 295, 302]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 306, 491, 346]]<|/det|>
<|ref|>table<|/ref|><|det|>[[91, 350, 455, 627]]<|/det|>
<|ref|>title<|/ref|><|det|>[[82, 630, 151, 646]]<|/det|>
<|ref|>text<|/ref|><|det|>[[58, 651, 491, 712]]<|/det|>
<|ref|>title<|/ref|><|det|>[[82, 716, 191, 733]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 737, 491, 777]]<|/det|>
<|ref|>text<|/ref|><|det|>[[82, 781, 219, 798]]<|/det|>
<|ref|>text<|/ref|><|det|>[[82, 802, 237, 819]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 823, 491, 863]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 110, 937, 150]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 154, 923, 172]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 176, 636, 192]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 198, 743, 215]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 220, 808, 237]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 242, 814, 259]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 264, 937, 325]]<|/det|>
<|ref|>title<|/ref|><|det|>[[529, 330, 588, 346]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 352, 937, 387]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 392, 937, 453]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 458, 937, 540]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 545, 937, 586]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 590, 937, 672]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 676, 937, 738]]<|/det|>
<|ref|>title<|/ref|><|det|>[[508, 703, 546, 719]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 724, 730, 741]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 746, 885, 764]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 768, 887, 785]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 790, 937, 830]]<|/det|>
After more testing, it happens both in HuggingFace transformers and Ollama, so the Model itself is broken
The DeepSeek OCR model often returns the content of this tag <|det|>, but these pages actually contain text that is not returned. The performance is far from what I expected. What could be wrong? I am using the API version from https://cloud.siliconflow.cn/.
Page 32: ██████████▏ | 31/258 [03:12<17:41, 4.67s/it]
<|ref|>text<|/ref|><|det|>[[57, 110, 491, 151]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 154, 491, 195]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 198, 491, 239]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 243, 491, 283]]<|/det|>
<|ref|>title<|/ref|><|det|>[[58, 262, 118, 280]]<|/det|>
<|ref|>text<|/ref|><|det|>[[81, 284, 295, 302]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 306, 491, 346]]<|/det|>
<|ref|>table<|/ref|><|det|>[[91, 350, 455, 627]]<|/det|>
<|ref|>title<|/ref|><|det|>[[82, 630, 151, 646]]<|/det|>
<|ref|>text<|/ref|><|det|>[[58, 651, 491, 712]]<|/det|>
<|ref|>title<|/ref|><|det|>[[82, 716, 191, 733]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 737, 491, 777]]<|/det|>
<|ref|>text<|/ref|><|det|>[[82, 781, 219, 798]]<|/det|>
<|ref|>text<|/ref|><|det|>[[82, 802, 237, 819]]<|/det|>
<|ref|>text<|/ref|><|det|>[[57, 823, 491, 863]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 110, 937, 150]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 154, 923, 172]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 176, 636, 192]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 198, 743, 215]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 220, 808, 237]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 242, 814, 259]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 264, 937, 325]]<|/det|>
<|ref|>title<|/ref|><|det|>[[529, 330, 588, 346]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 352, 937, 387]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 392, 937, 453]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 458, 937, 540]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 545, 937, 586]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 590, 937, 672]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 676, 937, 738]]<|/det|>
<|ref|>title<|/ref|><|det|>[[508, 703, 546, 719]]<|/det|>
<|ref|>text<|/ref|><|det|>[[529, 724, 730, 741]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 746, 885, 764]]<|/det|>
<|ref|>text<|/ref|><|det|>[[527, 768, 887, 785]]<|/det|>
<|ref|>text<|/ref|><|det|>[[507, 790, 937, 830]]<|/det|>
same issue here, it was kv cache quantizationm, small models suffer huge with quants, in bf16 works like a charm
same issue here, it was kv cache quantizationm, small models suffer huge with quants, in bf16 works like a charm
Did that also fixes the looping bug?
yep
I tried <|grounding|>, It doesn't fix the issue. With grounding, It'll return empty texts and without grounding it gets stuck with repetitive tokens especially when the language is not en(document consists multilingual data).
After testing with different parameter settings and patch for max_tokens, I was able to handle the issue. It may not be fixed because it's the issue with model. Here's the configuration that i used:
PATCH:
model.generation_config.max_new_tokens = 2048
MODEL INFERENCE CONFIGS:
"base_size": 640
"image_size": 640
"crop_mode": True
yep
Sadly for Ollama, the default K/V Cache is f16, setting it by hand set "OLLAMA_KV_CACHE_TYPE=f16" doesn't change anything, it still loops forever
aaaa I seee, Im using VLlM
someone know how to solve loop token problem?
Hey @prudant
I am also using vLLM and I still have the same problem. I tried the solution you proposed, running it with::vllm serve deepseek-ai/DeepSeek-OCR --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0 --kv-cache-dtype bfloat16
But I'm having the following error:
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] super().init(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 112, in init
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 269, in _initialize_kv_caches
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 461, in run_method
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return func(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 452, in compile_or_warm_up_model
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] cuda_graph_memory_bytes = self.model_runner.capture_model()
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5040, in capture_model
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] self._capture_cudagraphs(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5132, in _capture_cudagraphs
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] dummy_run(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return func(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in _dummy_run
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] outputs = self.model(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 222, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_ocr.py", line 574, in forward
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] hidden_states = self.language_model(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1263, in forward
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] hidden_states = self.model(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 472, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs) # type: ignore[arg-type]
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/wrapper.py", line 233, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._call_with_optional_nvtx_range(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/wrapper.py", line 119, in _call_with_optional_nvtx_range
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return callable_fn(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1086, in forward
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] def forward(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return fn(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 185, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] raise e
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File ".26", line 65, in forward
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] submod_1 = self.submod_1(getitem, s59, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] raise e
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File ".2", line 5, in forward
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] unified_kv_cache_update = torch.ops.vllm.unified_kv_cache_update(key_2, value, 'language_model.model.layers.0.self_attn.attn')
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1255, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._op(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 841, in unified_kv_cache_update
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] attn_layer.impl.do_kv_cache_update(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 793, in do_kv_cache_update
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] reshape_and_cache_flash(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/_custom_ops.py", line 2413, in reshape_and_cache_flash
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] torch.ops._C_cache_ops.reshape_and_cache_flash(
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1255, in call
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] return self._op(*args, **kwargs)
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) ERROR 02-25 10:12:04 [core.py:946] RuntimeError: Unsupported data type of kv cache: bfloat16
(EngineCore_DP0 pid=53846) Process EngineCore_DP0:
(EngineCore_DP0 pid=53846) Traceback (most recent call last):
(EngineCore_DP0 pid=53846) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=53846) self.run()
(EngineCore_DP0 pid=53846) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=53846) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=53846) raise e
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=53846) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=53846) super().init(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 112, in init
(EngineCore_DP0 pid=53846) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 269, in _initialize_kv_caches
(EngineCore_DP0 pid=53846) self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
(EngineCore_DP0 pid=53846) self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=53846) result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 461, in run_method
(EngineCore_DP0 pid=53846) return func(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 452, in compile_or_warm_up_model
(EngineCore_DP0 pid=53846) cuda_graph_memory_bytes = self.model_runner.capture_model()
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5040, in capture_model
(EngineCore_DP0 pid=53846) self._capture_cudagraphs(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5132, in _capture_cudagraphs
(EngineCore_DP0 pid=53846) dummy_run(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=53846) return func(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4697, in _dummy_run
(EngineCore_DP0 pid=53846) outputs = self.model(
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 222, in call
(EngineCore_DP0 pid=53846) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_ocr.py", line 574, in forward
(EngineCore_DP0 pid=53846) hidden_states = self.language_model(
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1263, in forward
(EngineCore_DP0 pid=53846) hidden_states = self.model(
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 472, in call
(EngineCore_DP0 pid=53846) return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs) # type: ignore[arg-type]
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/wrapper.py", line 233, in call
(EngineCore_DP0 pid=53846) return self._call_with_optional_nvtx_range(
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/wrapper.py", line 119, in _call_with_optional_nvtx_range
(EngineCore_DP0 pid=53846) return callable_fn(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1086, in forward
(EngineCore_DP0 pid=53846) def forward(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(EngineCore_DP0 pid=53846) return fn(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 185, in call
(EngineCore_DP0 pid=53846) return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=53846) return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in call
(EngineCore_DP0 pid=53846) raise e
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in call
(EngineCore_DP0 pid=53846) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File ".26", line 65, in forward
(EngineCore_DP0 pid=53846) submod_1 = self.submod_1(getitem, s59, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=53846) return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in call
(EngineCore_DP0 pid=53846) raise e
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in call
(EngineCore_DP0 pid=53846) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=53846) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=53846) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File ".2", line 5, in forward
(EngineCore_DP0 pid=53846) unified_kv_cache_update = torch.ops.vllm.unified_kv_cache_update(key_2, value, 'language_model.model.layers.0.self_attn.attn')
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1255, in call
(EngineCore_DP0 pid=53846) return self._op(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 841, in unified_kv_cache_update
(EngineCore_DP0 pid=53846) attn_layer.impl.do_kv_cache_update(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 793, in do_kv_cache_update
(EngineCore_DP0 pid=53846) reshape_and_cache_flash(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/_custom_ops.py", line 2413, in reshape_and_cache_flash
(EngineCore_DP0 pid=53846) torch.ops._C_cache_ops.reshape_and_cache_flash(
(EngineCore_DP0 pid=53846) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1255, in call
(EngineCore_DP0 pid=53846) return self._op(*args, **kwargs)
(EngineCore_DP0 pid=53846) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=53846) RuntimeError: Unsupported data type of kv cache: bfloat16
[rank0]:[W225 10:12:05.538028791 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=53745) Traceback (most recent call last):
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/bin/vllm", line 10, in
(APIServer pid=53745) sys.exit(main())
(APIServer pid=53745) ^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=53745) args.dispatch_function(args)
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 111, in cmd
(APIServer pid=53745) uvloop.run(run_server(args))
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=53745) return __asyncio.run(
(APIServer pid=53745) ^^^^^^^^^^^^^^
(APIServer pid=53745) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=53745) return runner.run(main)
(APIServer pid=53745) ^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=53745) return self._loop.run_until_complete(task)
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=53745) return await main
(APIServer pid=53745) ^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=53745) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=53745) async with build_async_engine_client(
(APIServer pid=53745) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=53745) return await anext(self.gen)
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=53745) async with build_async_engine_client_from_engine_args(
(APIServer pid=53745) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=53745) return await anext(self.gen)
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 188, in build_async_engine_client_from_engine_args
(APIServer pid=53745) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 228, in from_vllm_config
(APIServer pid=53745) return cls(
(APIServer pid=53745) ^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 155, in init
(APIServer pid=53745) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=53745) return AsyncMPClient(*client_args)
(APIServer pid=53745) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 819, in init
(APIServer pid=53745) super().init(
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 479, in init
(APIServer pid=53745) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=53745) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=53745) next(self.gen)
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
(APIServer pid=53745) wait_for_engine_startup(
(APIServer pid=53745) File "/hdd/deepseek-ocr/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
(APIServer pid=53745) raise RuntimeError(
(APIServer pid=53745) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Did you get this error? How did you solve it?
Thank you in advance.