synthesis / AttributePerception /logs /attr_0_40_mira.log
CharmingDog's picture
Upload folder using huggingface_hub
55500d6 verified
/share/liangzy/miniconda3/envs/vllm/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
Total Video Size: 31436
0%| | 0/31436 [00:00<?, ?it/s] 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 31436/31436 [00:00<00:00, 841376.69it/s]
Total Clips Size: 37658
Start: 0, End: 40
to process size: 40
Total size: 40
Sample show: <|im_start|>system
You are an AI assistant tasked with generating **high-quality attribute perception questions** based on a video snippet description from a long video.
## TASK:
Generate **one** high-quality **attribute perception question** that requires identifying properties or characteristics of objects/people, such as colors, materials, professions, ownership, appearance, position, role, or other static properties.) **while using composite features or contextual constraints to ensure answer uniqueness** in longer videos.
You must also provide **4 answer options (Aโ€“D)**, with only one correct answer, which is clearly supported by the visual or narrative content of the video description.
## CRITICAL RULES:
1. **Uniqueness Guarantee**: Each question must include one of the following constraints to ensure answer uniqueness:
- **Composite Feature**: Combine multiple attributes (e.g., 'What is color of the hat worn by the woman *in red dress*').
- **Contextual Constraint**: Use scene, or background information (e.g., 'What is the brand of the car *at parking lot*').
- **Functional/Usage Constraint**: Describe the object's function, purpose, or role (e.g., 'what is the material of the gloves *used for cooking*').
2. **Static Attributes Only**: Focus on static features such as colors, materials, professions, ownership, appearance, position, role, or other static properties.
3. **Description Grounding**: Answers must be directly verifiable from the provided text.
4. **Focus on Visual Entities Attribute**: The question must test the modelโ€™s ability to recognize the specific attribute of **objects**.
6. **Avoid Extraneous Information**: Do not rely on subtitles, voiceovers, or audio cues unless explicitly mentioned in the description.
7. **Clear and Logical Phrasing**: Keep the question clear, specific, and logically phrased to avoid ambiguity.
## OUTPUT FORMAT:
[{'Q': 'Question...', 'Options': ['A. ...', ..., 'D. ...'], 'Answer': 'X'}]
## EXAMPLES:
1. {'Q': 'What is Theresa Woo's profession?',
'Options': [
'A. Police Officer.',
'B. Journalist.',
'C. Doctor.',
'D. Lawyer.'
],
'Answer': 'B. Journalist.'}
2. {'Q': 'Which car was the first to be designed with an automobile roof?',
'Options': [
'A. Type 44.',
'B. Type 46.',
'C. Type 41.',
'D. Type 55.'
],
'Answer': 'A'}
3. {'Q': 'Which element has the highest peak in the spectra of sample token by Dr. Rudy Reimer?',
'Options': [
'A. Iron.',
'B. Zirconium.',
'C. Helium.',
'D. Aluminum.'
],
'Answer': 'B. Zirconium.'}
<|im_end|>
<|im_start|>user
I have provided you with three different aspect description of a specific clip from a long video. Below is these description:
**Dense Description:**
A person interacting with a collection of toy minions, focusing on one particular minion holding a guitar. The setting is a well-lit indoor environment, possibly a room or a studio, where the person demonstrates the features of the toy. The minion is detailed, with vibrant colors and a playful design, featuring a single large eye and a cheerful expression. Throughout the video, the person manipulates the toy, showing its flexibility and the different poses it can achieve, enhancing the viewer's understanding of the toy's design and functionality.
**Background Description:**
Simple and unobtrusive, featuring a reflective glass table that provides a clear view of the toy and its reflections. Other minions are arranged in the background, creating a playful and colorful array that adds depth to the scene. The indoor lighting is bright, ensuring that all details of the toy are visible and enhancing the visual appeal of the colorful toys.
**Main Object Description:**
A person's hands, is seen handling a toy minion with a guitar. Initially, the toy is picked up from a glass surface reflecting an array of other minions. The person rotates the toy, displaying its front and back, and adjusts its limbs and guitar, showcasing its articulation and design features. The actions are gentle and precise, indicating a demonstration or review of the toy's characteristics.
Based on these description and the system instructions, generate **one** high-quality attribute perception question-and-answer pair.
## REQUIREMENTS:
- The question must focus on **identifying the specific attribute of visible objects, such as colors, materials, professions, ownership, appearance, position, role, or other static properties.**
- You must use an composite feature, contextual constraint or functional/usage constraint in question to constrain the question, thereby ensuring answer uniqueness.
- The answer must be directly observable in the description without any reasoning or inference.
## OUTPUT FORMAT:
[{'Q': 'Your question here...', 'Options': ['A. ...', 'B. ...', 'C. ...', 'D. ...'], 'Answer': 'Correct answer here...'}]
**Only return the QA pair in the specified JSON list format.**<|im_end|>
<|im_start|>assistant
DP rank 0 needs to process 10 prompts
INFO 04-29 01:25:17 __init__.py:207] Automatically detected platform cuda.
DP rank 2 needs to process 10 prompts
INFO 04-29 01:25:17 __init__.py:207] Automatically detected platform cuda.
DP rank 3 needs to process 10 prompts
INFO 04-29 01:25:17 __init__.py:207] Automatically detected platform cuda.
DP rank 1 needs to process 10 prompts
INFO 04-29 01:25:17 __init__.py:207] Automatically detected platform cuda.
INFO 04-29 01:25:31 config.py:549] This model supports multiple tasks: {'generate', 'score', 'reward', 'embed', 'classify'}. Defaulting to 'generate'.
INFO 04-29 01:25:31 config.py:549] This model supports multiple tasks: {'generate', 'score', 'reward', 'embed', 'classify'}. Defaulting to 'generate'.
INFO 04-29 01:25:31 config.py:549] This model supports multiple tasks: {'generate', 'score', 'reward', 'embed', 'classify'}. Defaulting to 'generate'.
INFO 04-29 01:25:31 config.py:549] This model supports multiple tasks: {'generate', 'score', 'reward', 'embed', 'classify'}. Defaulting to 'generate'.
INFO 04-29 01:25:33 gptq_marlin.py:143] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 04-29 01:25:33 gptq_marlin.py:143] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 04-29 01:25:33 gptq_marlin.py:143] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 04-29 01:25:33 gptq_marlin.py:143] The model is convertible to gptq_marlin during runtime. Using gptq_marlin kernel.
INFO 04-29 01:25:33 config.py:1382] Defaulting to use mp for distributed inference
WARNING 04-29 01:25:33 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
WARNING 04-29 01:25:33 config.py:685] Async output processing is not supported on the current platform type cuda.
INFO 04-29 01:25:33 config.py:1382] Defaulting to use mp for distributed inference
WARNING 04-29 01:25:33 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
WARNING 04-29 01:25:33 config.py:685] Async output processing is not supported on the current platform type cuda.
INFO 04-29 01:25:33 config.py:1382] Defaulting to use mp for distributed inference
WARNING 04-29 01:25:33 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 04-29 01:25:33 config.py:1382] Defaulting to use mp for distributed inference
WARNING 04-29 01:25:33 config.py:685] Async output processing is not supported on the current platform type cuda.
WARNING 04-29 01:25:33 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
WARNING 04-29 01:25:33 config.py:685] Async output processing is not supported on the current platform type cuda.
INFO 04-29 01:25:33 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', speculative_config=None, tokenizer='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,
INFO 04-29 01:25:33 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', speculative_config=None, tokenizer='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,
INFO 04-29 01:25:33 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', speculative_config=None, tokenizer='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,
INFO 04-29 01:25:33 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', speculative_config=None, tokenizer='/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,
WARNING 04-29 01:25:34 multiproc_worker_utils.py:300] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 04-29 01:25:34 custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:34 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
WARNING 04-29 01:25:34 multiproc_worker_utils.py:300] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 04-29 01:25:34 custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
WARNING 04-29 01:25:34 multiproc_worker_utils.py:300] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 04-29 01:25:34 custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:34 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:34 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
WARNING 04-29 01:25:34 multiproc_worker_utils.py:300] Reducing Torch parallelism from 48 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 04-29 01:25:34 custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:34 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:34 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4
INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4
INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
INFO 04-29 01:25:35 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-29 01:25:35 pynccl.py:69] vLLM is using nccl==2.21.5
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.21.5+cuda12.4
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO ncclCommInitRank comm 0x103cc3e0 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId d0 commId 0xf63a7ab5300e03c7 - Init START
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Setting affinity for GPU 6 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO comm 0x103cc3e0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 06/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 07/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 08/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 09/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 10/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 11/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 12/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 13/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO cudaDriverVersion 12040
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO ncclCommInitRank comm 0x103cb360 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId e0 commId 0xf63a7ab5300e03c7 - Init START
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Setting affinity for GPU 7 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO comm 0x103cb360 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared oINFO 04-29 01:25:35 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_6,7.json
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:35 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_6,7.json
INFO 04-29 01:25:36 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_60ebdd7a'), local_subscribe_port=39073, remote_subscribe_port=None)
INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO ncclCommInitRank comm 0x103cc3a0 rank 0 nranks 2 cudaDev 0 nvmlDev 4 busId b0 commId 0x8e953ac266d1e0a0 - Init START
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Setting affinity for GPU 4 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO comm 0x103cc3a0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 00/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 01/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 02/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 03/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 04/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 05/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 06/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 07/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 08/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 09/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 10/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 11/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 12/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 13/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO ncclCommInitRank comm 0x103cd060 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 90 commId 0x79740f65bcea1ec1 - Init START
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO comm 0x103cd060 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 00/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 01/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 02/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 03/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 04/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 05/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 06/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 07/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 08/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 09/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 10/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 11/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 12/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 13/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
(VllmWorkerProcess pid=3580088) INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO cudaDriverVersion 12040
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO ncclCommInitRank comm 0x10392270 rank 1 nranks 2 cudaDev 1 nvmlDev 5 busId c0 commId 0x8e953ac266d1e0a0 - Init START
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Setting affinity for GPU 5 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO comm 0x10392270 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 00/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 01/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 02/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 03/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 04/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 05/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 06/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 07/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 08/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 09/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 10/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 11/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 12/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 13/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 14/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Channel 15/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared o(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_4,5.json
INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_4,5.json
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO cudaDriverVersion 12040
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO ncclCommInitRank comm 0x103ccdd0 rank 1 nranks 2 cudaDev 1 nvmlDev 3 busId a0 commId 0x79740f65bcea1ec1 - Init START
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO comm 0x103ccdd0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 00/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 01/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 02/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 03/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 04/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 05/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 06/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 07/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 08/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 09/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 10/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 11/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 12/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 13/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 14/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Channel 15/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared odsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO ncclCommInitRank comm 0x10391760 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 70 commId 0xdc48fc766287bdd8 - Init START
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO comm 0x10391760 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_2,3.json
INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_2,3.json
INFO 04-29 01:25:36 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_d3f5175f'), local_subscribe_port=59139, remote_subscribe_port=None)
INFO 04-29 01:25:36 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_ecf56c91'), local_subscribe_port=42239, remote_subscribe_port=None)
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO cudaDriverVersion 12040
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Bootstrap : Using net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO NET/Plugin: Using internal network plugin.
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB net0:192.168.16.244<0>
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO ncclCommInitRank comm 0x10392030 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 80 commId 0xdc48fc766287bdd8 - Init START
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO comm 0x10392030 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared oINFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:36 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
(VllmWorkerProcess pid=3580190) INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
INFO 04-29 01:25:36 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_ebb984bd'), local_subscribe_port=56513, remote_subscribe_port=None)
INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
(VllmWorkerProcess pid=3580186) INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:36 model_runner.py:1110] Starting to load model /share/minghao/Models/Qwen2.5-72B-Instruct-GPTQ-Int8...
INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
(VllmWorkerProcess pid=3580241) INFO 04-29 01:25:36 gptq_marlin.py:235] Using MarlinLinearKernel for GPTQMarlinLinearMethod
Loading safetensors checkpoint shards: 0% Completed | 0/20 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 0% Completed | 0/20 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 0% Completed | 0/20 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 0% Completed | 0/20 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 5% Completed | 1/20 [00:02<00:40, 2.12s/it]
Loading safetensors checkpoint shards: 10% Completed | 2/20 [00:04<00:38, 2.11s/it]
Loading safetensors checkpoint shards: 5% Completed | 1/20 [00:04<01:28, 4.64s/it]
Loading safetensors checkpoint shards: 5% Completed | 1/20 [00:04<01:27, 4.62s/it]
Loading safetensors checkpoint shards: 5% Completed | 1/20 [00:04<01:26, 4.58s/it]
Loading safetensors checkpoint shards: 15% Completed | 3/20 [00:06<00:37, 2.22s/it]
Loading safetensors checkpoint shards: 10% Completed | 2/20 [00:07<01:00, 3.36s/it]
Loading safetensors checkpoint shards: 10% Completed | 2/20 [00:07<00:59, 3.33s/it]
Loading safetensors checkpoint shards: 10% Completed | 2/20 [00:07<01:00, 3.35s/it]
Loading safetensors checkpoint shards: 20% Completed | 4/20 [00:08<00:36, 2.26s/it]
Loading safetensors checkpoint shards: 15% Completed | 3/20 [00:09<00:50, 2.97s/it]
Loading safetensors checkpoint shards: 15% Completed | 3/20 [00:09<00:50, 2.95s/it]
Loading safetensors checkpoint shards: 15% Completed | 3/20 [00:09<00:50, 2.96s/it]
Loading safetensors checkpoint shards: 25% Completed | 5/20 [00:11<00:35, 2.34s/it]
Loading safetensors checkpoint shards: 20% Completed | 4/20 [00:12<00:45, 2.83s/it]
Loading safetensors checkpoint shards: 20% Completed | 4/20 [00:12<00:45, 2.84s/it]
Loading safetensors checkpoint shards: 20% Completed | 4/20 [00:12<00:45, 2.84s/it]
Loading safetensors checkpoint shards: 30% Completed | 6/20 [00:13<00:32, 2.34s/it]
Loading safetensors checkpoint shards: 25% Completed | 5/20 [00:14<00:40, 2.71s/it]
Loading safetensors checkpoint shards: 25% Completed | 5/20 [00:14<00:40, 2.71s/it]
Loading safetensors checkpoint shards: 25% Completed | 5/20 [00:14<00:40, 2.70s/it]
Loading safetensors checkpoint shards: 35% Completed | 7/20 [00:16<00:31, 2.39s/it]
Loading safetensors checkpoint shards: 30% Completed | 6/20 [00:17<00:37, 2.65s/it]
Loading safetensors checkpoint shards: 30% Completed | 6/20 [00:17<00:37, 2.65s/it]
Loading safetensors checkpoint shards: 30% Completed | 6/20 [00:17<00:37, 2.65s/it]
Loading safetensors checkpoint shards: 40% Completed | 8/20 [00:18<00:28, 2.39s/it]
Loading safetensors checkpoint shards: 45% Completed | 9/20 [00:19<00:22, 2.01s/it]
Loading safetensors checkpoint shards: 35% Completed | 7/20 [00:19<00:34, 2.62s/it]
Loading safetensors checkpoint shards: 35% Completed | 7/20 [00:19<00:34, 2.62s/it]
Loading safetensors checkpoint shards: 35% Completed | 7/20 [00:19<00:34, 2.62s/it]
Loading safetensors checkpoint shards: 50% Completed | 10/20 [00:22<00:21, 2.14s/it]
Loading safetensors checkpoint shards: 40% Completed | 8/20 [00:22<00:31, 2.58s/it]
Loading safetensors checkpoint shards: 40% Completed | 8/20 [00:22<00:31, 2.59s/it]
Loading safetensors checkpoint shards: 40% Completed | 8/20 [00:22<00:31, 2.59s/it]
Loading safetensors checkpoint shards: 45% Completed | 9/20 [00:23<00:24, 2.22s/it]
Loading safetensors checkpoint shards: 45% Completed | 9/20 [00:23<00:24, 2.22s/it]
Loading safetensors checkpoint shards: 45% Completed | 9/20 [00:23<00:24, 2.22s/it]
Loading safetensors checkpoint shards: 55% Completed | 11/20 [00:24<00:20, 2.26s/it]
Loading safetensors checkpoint shards: 60% Completed | 12/20 [00:27<00:18, 2.27s/it]
Loading safetensors checkpoint shards: 65% Completed | 13/20 [00:29<00:16, 2.34s/it]
Loading safetensors checkpoint shards: 50% Completed | 10/20 [00:30<00:35, 3.60s/it]
Loading safetensors checkpoint shards: 50% Completed | 10/20 [00:30<00:35, 3.60s/it]
Loading safetensors checkpoint shards: 50% Completed | 10/20 [00:30<00:35, 3.60s/it]
Loading safetensors checkpoint shards: 70% Completed | 14/20 [00:31<00:14, 2.35s/it]
Loading safetensors checkpoint shards: 75% Completed | 15/20 [00:33<00:10, 2.12s/it]
Loading safetensors checkpoint shards: 55% Completed | 11/20 [00:34<00:32, 3.63s/it]
Loading safetensors checkpoint shards: 55% Completed | 11/20 [00:34<00:32, 3.63s/it]
Loading safetensors checkpoint shards: 55% Completed | 11/20 [00:34<00:32, 3.63s/it]
Loading safetensors checkpoint shards: 80% Completed | 16/20 [00:35<00:08, 2.20s/it]
Loading safetensors checkpoint shards: 60% Completed | 12/20 [00:37<00:28, 3.56s/it]
Loading safetensors checkpoint shards: 60% Completed | 12/20 [00:37<00:28, 3.56s/it]
Loading safetensors checkpoint shards: 60% Completed | 12/20 [00:37<00:28, 3.56s/it]
Loading safetensors checkpoint shards: 85% Completed | 17/20 [00:38<00:06, 2.22s/it]
Loading safetensors checkpoint shards: 90% Completed | 18/20 [00:40<00:04, 2.29s/it]
Loading safetensors checkpoint shards: 65% Completed | 13/20 [00:40<00:24, 3.49s/it]
Loading safetensors checkpoint shards: 65% Completed | 13/20 [00:40<00:24, 3.48s/it]
Loading safetensors checkpoint shards: 65% Completed | 13/20 [00:40<00:24, 3.49s/it]
Loading safetensors checkpoint shards: 95% Completed | 19/20 [00:43<00:02, 2.34s/it]
Loading safetensors checkpoint shards: 70% Completed | 14/20 [00:44<00:20, 3.49s/it]
Loading safetensors checkpoint shards: 70% Completed | 14/20 [00:44<00:20, 3.49s/it]
Loading safetensors checkpoint shards: 70% Completed | 14/20 [00:44<00:20, 3.49s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [00:45<00:00, 2.39s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [00:45<00:00, 2.28s/it]
INFO 04-29 01:26:22 model_runner.py:1115] Loading model weights took 35.6627 GB
Loading safetensors checkpoint shards: 75% Completed | 15/20 [00:46<00:14, 2.97s/it]
Loading safetensors checkpoint shards: 75% Completed | 15/20 [00:46<00:14, 2.97s/it]
Loading safetensors checkpoint shards: 75% Completed | 15/20 [00:46<00:14, 2.97s/it]
(VllmWorkerProcess pid=3580088) INFO 04-29 01:26:23 model_runner.py:1115] Loading model weights took 35.6627 GB
Loading safetensors checkpoint shards: 80% Completed | 16/20 [00:49<00:12, 3.03s/it]
Loading safetensors checkpoint shards: 80% Completed | 16/20 [00:49<00:12, 3.03s/it]
Loading safetensors checkpoint shards: 80% Completed | 16/20 [00:49<00:12, 3.03s/it]
Loading safetensors checkpoint shards: 85% Completed | 17/20 [00:52<00:09, 3.07s/it]
Loading safetensors checkpoint shards: 85% Completed | 17/20 [00:52<00:09, 3.07s/it]
Loading safetensors checkpoint shards: 85% Completed | 17/20 [00:52<00:09, 3.07s/it]
Loading safetensors checkpoint shards: 90% Completed | 18/20 [00:55<00:06, 3.04s/it]
Loading safetensors checkpoint shards: 90% Completed | 18/20 [00:55<00:06, 3.04s/it]
Loading safetensors checkpoint shards: 90% Completed | 18/20 [00:55<00:06, 3.04s/it]
(VllmWorkerProcess pid=3580241) INFO 04-29 01:26:34 model_runner.py:1115] Loading model weights took 35.6627 GB
(VllmWorkerProcess pid=3580186) INFO 04-29 01:26:34 model_runner.py:1115] Loading model weights took 35.6627 GB
(VllmWorkerProcess pid=3580190) INFO 04-29 01:26:34 model_runner.py:1115] Loading model weights took 35.6627 GB
Loading safetensors checkpoint shards: 95% Completed | 19/20 [00:58<00:02, 3.00s/it]
Loading safetensors checkpoint shards: 95% Completed | 19/20 [00:58<00:02, 3.00s/it]
Loading safetensors checkpoint shards: 95% Completed | 19/20 [00:58<00:02, 3.00s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 20/20 [01:01<00:00, 3.08s/it]
INFO 04-29 01:26:38 model_runner.py:1115] Loading model weights took 35.6627 GB
INFO 04-29 01:26:38 model_runner.py:1115] Loading model weights took 35.6627 GB
INFO 04-29 01:26:39 model_runner.py:1115] Loading model weights took 35.6627 GB
668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 14/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Channel 15/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3579779:3579779 [0] NCCL INFO ncclCommInitRank comm 0x103cc3e0 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId d0 commId 0xf63a7ab5300e03c7 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO ncclCommInitRank comm 0x37519470 rank 0 nranks 2 cudaDev 0 nvmlDev 6 busId d0 commId 0xcc1403c2bc0b393 - Init START
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Setting affinity for GPU 6 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO comm 0x37519470 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 00/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 01/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 02/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 03/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 04/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579779:3580831 [0] NCCL INFO Channel 05/0 : 0[6] -> 1[7] via P2P/IPC/read
dsw-222255-668f796bject file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3580088:3580088 [1] NCCL INFO ncclCommInitRank comm 0x103cb360 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId e0 commId 0xf63a7ab5300e03c7 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO ncclCommInitRank comm 0x37522810 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId e0 commId 0xcc1403c2bc0b393 - Init START
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Setting affinity for GPU 7 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO comm 0x37522810 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 00/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 01/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 02/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 03/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 04/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 05/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 06/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 07/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 08/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 09/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 10/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 11/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 12/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 13/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 14/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Channel 15/0 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580088:3580832 [1] NCCL INFO ncclCommInitRank comm 0x37522810 rank 1 nranks 2 cudaDev 1 nvmlDev 7 busId e0 commId 0xcc1403c2bc0b393 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580088:3580839 [1] NCCL INFO Channel 00/1 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580839 [1] NCCL INFO Channel 01/1 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580839 [1] NCCL INFO Channel 02/1 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580839 [1] NCCL INFO Channel 03/1 : 1[7] -> 0[6] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580088:3580839 [1] NCCL IN(VllmWorkerProcess pid=3580088) INFO 04-29 01:26:44 worker.py:267] Memory profiling takes 21.13 seconds
(VllmWorkerProcess pid=3580088) INFO 04-29 01:26:44 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
(VllmWorkerProcess pid=3580088) INFO 04-29 01:26:44 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
INFO 04-29 01:26:44 worker.py:267] Memory profiling takes 21.20 seconds
INFO 04-29 01:26:44 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
INFO 04-29 01:26:44 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
INFO 04-29 01:26:44 executor_base.py:111] # cuda blocks: 11901, # CPU blocks: 1638
INFO 04-29 01:26:44 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 5.81x
INFO 04-29 01:26:47 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 23.76 seconds
Processed prompts: 0%| | 0/10 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s] Processed prompts: 10%|โ–Š | 1/10 [00:07<01:05, 7.27s/it, est. speed input: 146.13 toks/s, output: 8.26 toks/s] Processed prompts: 20%|โ–ˆโ– | 2/10 [00:07<00:26, 3.34s/it, est. speed input: 278.27 toks/s, output: 15.40 toks/s] Processed prompts: 30%|โ–ˆโ–ˆ | 3/10 [00:08<00:14, 2.08s/it, est. speed input: 382.77 toks/s, output: 21.71 toks/s] Processed prompts: 40%|โ–ˆโ–ˆโ–Š | 4/10 [00:08<00:08, 1.34s/it, est. speed input: 509.27 toks/s, output: 28.92 toks/s] Processed prompts: 70%|โ–ˆโ–ˆโ–ˆโ–ˆโ–‰ | 7/10 [00:08<00:01, 1.81it/s, est. speed input: 867.21 toks/s, output: 51.69 toks/s] Processed prompts: 90%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž| 9/10 [00:10<00:00, 1.47it/s, est. speed input: 932.41 toks/s, output: 55.12 toks/s]668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 14/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Channel 15/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3579778:3579778 [0] NCCL INFO ncclCommInitRank comm 0x103cc3a0 rank 0 nranks 2 cudaDev 0 nvmlDev 4 busId b0 commId 0x8e953ac266d1e0a0 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO ncclCommInitRank comm 0x375193d0 rank 0 nranks 2 cudaDev 0 nvmlDev 4 busId b0 commId 0xbbd56d6f733fd00 - Init START
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Setting affinity for GPU 4 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO comm 0x375193d0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 00/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 01/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 02/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 03/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 04/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579778:3580959 [0] NCCL INFO Channel 05/0 : 0[4] -> 1[5] via P2P/IPC/read
dsw-222255-668f796bject file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3580190:3580190 [1] NCCL INFO ncclCommInitRank comm 0x10392270 rank 1 nranks 2 cudaDev 1 nvmlDev 5 busId c0 commId 0x8e953ac266d1e0a0 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO ncclCommInitRank comm 0x37521b70 rank 1 nranks 2 cudaDev 1 nvmlDev 5 busId c0 commId 0xbbd56d6f733fd00 - Init START
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Setting affinity for GPU 5 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO comm 0x37521b70 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 00/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 01/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 02/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 03/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 04/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 05/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 06/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 07/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 08/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 09/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 10/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 11/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 12/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 13/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 14/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Channel 15/0 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580190:3580960 [1] NCCL INFO ncclCommInitRank comm 0x37521b70 rank 1 nranks 2 cudaDev 1 nvmlDev 5 busId c0 commId 0xbbd56d6f733fd00 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580190:3580967 [1] NCCL INFO Channel 00/1 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580967 [1] NCCL INFO Channel 01/1 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580967 [1] NCCL INFO Channel 02/1 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580967 [1] NCCL INFO Channel 03/1 : 1[5] -> 0[4] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580190:3580967 [1] NCCL IN668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3579776:3579776 [0] NCCL INFO ncclCommInitRank comm 0x10391760 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 70 commId 0xdc48fc766287bdd8 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO ncclCommInitRank comm 0x375196b0 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 70 commId 0x52ce3a6a9bba8d6a - Init START
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO comm 0x375196b0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579776:3580976 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read
dsw-222255-668f79(VllmWorkerProcess pid=3580190) INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.42 seconds
(VllmWorkerProcess pid=3580190) INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
(VllmWorkerProcess pid=3580190) INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.49 seconds
INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
bject file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3580241:3580241 [1] NCCL INFO ncclCommInitRank comm 0x10392030 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 80 commId 0xdc48fc766287bdd8 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO ncclCommInitRank comm 0x37521690 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 80 commId 0x52ce3a6a9bba8d6a - Init START
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO comm 0x37521690 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580241:3580977 [1] NCCL INFO ncclCommInitRank comm 0x37521690 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 80 commId 0x52ce3a6a9bba8d6a - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580241:3580992 [1] NCCL INFO Channel 00/1 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580992 [1] NCCL INFO Channel 01/1 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580992 [1] NCCL INFO Channel 02/1 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580992 [1] NCCL INFO Channel 03/1 : 1[1] -> 0[0] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580241:3580992 [1] NCCL 668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 14/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Channel 15/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3579777:3579777 [0] NCCL INFO ncclCommInitRank comm 0x103cd060 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 90 commId 0x79740f65bcea1ec1 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO ncclCommInitRank comm 0x3751a050 rank 0 nranks 2 cudaDev 0 nvmlDev 2 busId 90 commId 0x5d1b579f41e9db3a - Init START
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO comm 0x3751a050 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 00/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 01/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 02/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 03/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 04/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 05/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 06/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 07/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 08/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 09/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 10/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 11/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 12/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 13/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 14/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 15/16 : 0 1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 00/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 01/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 02/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 03/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 04/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3579777:3580983 [0] NCCL INFO Channel 05/0 : 0[2] -> 1[3] via P2P/IPC/read
dsw-222255-668f79bject file: No such file or directory : when loading libnccl-tuner.so
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin.
dsw-222255-668f79686f-2vnl5:3580186:3580186 [1] NCCL INFO ncclCommInitRank comm 0x103ccdd0 rank 1 nranks 2 cudaDev 1 nvmlDev 3 busId a0 commId 0x79740f65bcea1ec1 - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Using non-device net plugin version 0
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Using network IB
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO ncclCommInitRank comm 0x3751f600 rank 1 nranks 2 cudaDev 1 nvmlDev 3 busId a0 commId 0x5d1b579f41e9db3a - Init START
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO comm 0x3751f600 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO P2P Chunksize set to 524288
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 00/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 01/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 02/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 03/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 04/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 05/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 06/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 07/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 08/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 09/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 10/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 11/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 12/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 13/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 14/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Channel 15/0 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Connected all rings
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO Connected all trees
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
dsw-222255-668f79686f-2vnl5:3580186:3580984 [1] NCCL INFO ncclCommInitRank comm 0x3751f600 rank 1 nranks 2 cudaDev 1 nvmlDev 3 busId a0 commId 0x5d1b579f41e9db3a - Init COMPLETE
dsw-222255-668f79686f-2vnl5:3580186:3580995 [1] NCCL INFO Channel 00/1 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580995 [1] NCCL INFO Channel 01/1 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580995 [1] NCCL INFO Channel 02/1 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580995 [1] NCCL INFO Channel 03/1 : 1[3] -> 0[2] via P2P/IPC/read
dsw-222255-668f79686f-2vnl5:3580186:3580995 [1] NCCL Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:11<00:00, 1.44it/s, est. speed input: 969.02 toks/s, output: 57.30 toks/s] Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:11<00:00, 1.14s/it, est. speed input: 969.02 toks/s, output: 57.30 toks/s]
INFO 04-29 01:26:58 executor_base.py:111] # cuda blocks: 11901, # CPU blocks: 1638
INFO 04-29 01:26:58 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 5.81x
(VllmWorkerProcess pid=3580241) INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.66 seconds
(VllmWorkerProcess pid=3580241) INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
(VllmWorkerProcess pid=3580241) INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
(VllmWorkerProcess pid=3580186) INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.52 seconds
(VllmWorkerProcess pid=3580186) INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
(VllmWorkerProcess pid=3580186) INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.72 seconds
INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
ๆŽจ็†ๅฎŒๆˆ Total Finish:10
batch time cost: 11.467466115951538s
[Memory] CPU: 7167.85 MB
[Memory] GPU: 66295.75 MB
INFO 04-29 01:26:58 multiproc_worker_utils.py:141] Terminating local vLLM worker processes
(VllmWorkerProcess pid=3580088) INFO 04-29 01:26:58 multiproc_worker_utils.py:253] Worker exiting
INFO 04-29 01:26:58 worker.py:267] Memory profiling takes 19.62 seconds
INFO 04-29 01:26:58 worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.90) = 71.39GiB
INFO 04-29 01:26:58 worker.py:267] model weights take 35.66GiB; non_torch_memory takes 1.01GiB; PyTorch activation peak memory takes 5.66GiB; the rest of the memory reserved for KV Cache is 29.06GiB.
INFO 04-29 01:26:59 executor_base.py:111] # cuda blocks: 11901, # CPU blocks: 1638
INFO 04-29 01:26:59 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 5.81x
INFO 04-29 01:26:59 executor_base.py:111] # cuda blocks: 11901, # CPU blocks: 1638
INFO 04-29 01:26:59 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 5.81x
/share/liangzy/miniconda3/envs/vllm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
INFO 04-29 01:27:02 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 23.53 seconds
Processed prompts: 0%| | 0/10 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]INFO 04-29 01:27:02 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 23.71 seconds
INFO 04-29 01:27:02 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 23.55 seconds
Processed prompts: 0%| | 0/10 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s] Processed prompts: 0%| | 0/10 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s] Processed prompts: 10%|โ–Š | 1/10 [00:06<01:02, 6.97s/it, est. speed input: 158.19 toks/s, output: 7.90 toks/s] Processed prompts: 10%|โ–Š | 1/10 [00:06<01:02, 6.98s/it, est. speed input: 149.49 toks/s, output: 7.45 toks/s] Processed prompts: 10%|โ–Š | 1/10 [00:07<01:05, 7.32s/it, est. speed input: 151.18 toks/s, output: 8.34 toks/s] Processed prompts: 20%|โ–ˆโ– | 2/10 [00:07<00:25, 3.24s/it, est. speed input: 290.12 toks/s, output: 14.73 toks/s] Processed prompts: 20%|โ–ˆโ– | 2/10 [00:07<00:27, 3.39s/it, est. speed input: 271.47 toks/s, output: 15.59 toks/s] Processed prompts: 20%|โ–ˆโ– | 2/10 [00:07<00:26, 3.30s/it, est. speed input: 301.44 toks/s, output: 13.90 toks/s] Processed prompts: 30%|โ–ˆโ–ˆ | 3/10 [00:08<00:14, 2.07s/it, est. speed input: 395.43 toks/s, output: 20.75 toks/s] Processed prompts: 30%|โ–ˆโ–ˆ | 3/10 [00:08<00:14, 2.14s/it, est. speed input: 387.41 toks/s, output: 21.85 toks/s] Processed prompts: 30%|โ–ˆโ–ˆ | 3/10 [00:08<00:14, 2.07s/it, est. speed input: 424.70 toks/s, output: 19.74 toks/s] Processed prompts: 40%|โ–ˆโ–ˆโ–Š | 4/10 [00:08<00:07, 1.31s/it, est. speed input: 523.10 toks/s, output: 27.91 toks/s] Processed prompts: 70%|โ–ˆโ–ˆโ–ˆโ–ˆโ–‰ | 7/10 [00:08<00:01, 1.60it/s, est. speed input: 895.52 toks/s, output: 51.36 toks/s] Processed prompts: 50%|โ–ˆโ–ˆโ–ˆโ–Œ | 5/10 [00:08<00:04, 1.13it/s, est. speed input: 641.31 toks/s, output: 35.21 toks/s] Processed prompts: 80%|โ–ˆโ–ˆโ–ˆโ–ˆโ–Š | 8/10 [00:08<00:01, 1.91it/s, est. speed input: 1014.16 toks/s, output: 58.57 toks/s] Processed prompts: 50%|โ–ˆโ–ˆโ–ˆโ–Œ | 5/10 [00:08<00:04, 1.01it/s, est. speed input: 673.52 toks/s, output: 33.29 toks/s] Processed prompts: 70%|โ–ˆโ–ˆโ–ˆโ–ˆโ–‰ | 7/10 [00:08<00:01, 1.99it/s, est. speed input: 871.55 toks/s, output: 49.93 toks/s] Processed prompts: 80%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ | 8/10 [00:09<00:00, 2.22it/s, est. speed input: 965.81 toks/s, output: 56.90 toks/s] Processed prompts: 90%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž| 9/10 [00:11<00:00, 1.12it/s, est. speed input: 917.75 toks/s, output: 52.66 toks/s] Processed prompts: 90%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž| 9/10 [00:10<00:00, 1.27it/s, est. speed input: 918.12 toks/s, output: 54.06 toks/s] Processed prompts: 90%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž| 9/10 [00:11<00:00, 1.30it/s, est. speed input: 906.44 toks/s, output: 49.22 toks/s] Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:11<00:00, 1.40it/s, est. speed input: 971.19 toks/s, output: 57.28 toks/s] Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:11<00:00, 1.13s/it, est. speed input: 971.19 toks/s, output: 57.28 toks/s]
ๆŽจ็†ๅฎŒๆˆ Total Finish:10
batch time cost: 11.371723175048828s
[Memory] CPU: 7178.60 MB
[Memory] GPU: 66295.75 MB
INFO 04-29 01:27:14 multiproc_worker_utils.py:141] Terminating local vLLM worker processes
(VllmWorkerProcess pid=3580241) INFO 04-29 01:27:14 multiproc_worker_utils.py:253] Worker exiting
Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:12<00:00, 1.10it/s, est. speed input: 947.36 toks/s, output: 54.53 toks/s] Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:12<00:00, 1.20s/it, est. speed input: 947.36 toks/s, output: 54.53 toks/s]
ๆŽจ็†ๅฎŒๆˆ Total Finish:10
batch time cost: 12.05191159248352s
[Memory] CPU: 7178.42 MB
[Memory] GPU: 66295.75 MB
INFO 04-29 01:27:14 multiproc_worker_utils.py:141] Terminating local vLLM worker processes
(VllmWorkerProcess pid=3580190) INFO 04-29 01:27:14 multiproc_worker_utils.py:253] Worker exiting
Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:12<00:00, 1.23it/s, est. speed input: 929.71 toks/s, output: 52.24 toks/s] Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10/10 [00:12<00:00, 1.22s/it, est. speed input: 929.71 toks/s, output: 52.24 toks/s]
ๆŽจ็†ๅฎŒๆˆ Total Finish:10
batch time cost: 12.19089126586914s
[Memory] CPU: 7175.97 MB
[Memory] GPU: 66295.75 MB
INFO 04-29 01:27:15 multiproc_worker_utils.py:141] Terminating local vLLM worker processes
(VllmWorkerProcess pid=3580186) INFO 04-29 01:27:15 multiproc_worker_utils.py:253] Worker exiting
/share/liangzy/miniconda3/envs/vllm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/share/liangzy/miniconda3/envs/vllm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/share/liangzy/miniconda3/envs/vllm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
OOMไบ†ๆฒกๆœ‰๏ผŸ
Total size: 40 Total time cost: 121.66365146636963s