nie10 commited on
Commit
90b2866
·
verified ·
1 Parent(s): e7f2879

Delete logs

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. logs/20250526_182357/process_pids.txt +0 -2
  2. logs/20250526_182357/remote_rm_qa.log +0 -0
  3. logs/20250526_182357/train.log +0 -0
  4. logs/20250526_185638/process_pids.txt +0 -2
  5. logs/20250526_185638/remote_rm_qa.log +0 -0
  6. logs/20250526_185638/train.log +0 -0
  7. logs/20250526_191827/process_pids.txt +0 -2
  8. logs/20250526_191827/remote_rm_qa.log +0 -0
  9. logs/20250526_191827/train.log +0 -0
  10. logs/20250526_193312/process_pids.txt +0 -2
  11. logs/20250526_193312/remote_rm_qa.log +0 -9
  12. logs/20250526_193312/train.log +0 -2
  13. logs/20250526_193656/process_pids.txt +0 -2
  14. logs/20250526_193656/remote_rm_qa.log +0 -9
  15. logs/20250526_193656/train.log +0 -77
  16. logs/20250526_194456/process_pids.txt +0 -2
  17. logs/20250526_194456/remote_rm_qa.log +0 -0
  18. logs/20250526_194456/train.log +0 -92
  19. logs/20250527_011343/process_pids.txt +0 -2
  20. logs/20250527_011343/remote_rm_qa.log +0 -9
  21. logs/20250527_011343/train.log +0 -45
  22. logs/20250527_095510/process_pids.txt +0 -2
  23. logs/20250527_095510/remote_rm_qa.log +0 -3
  24. logs/20250527_095510/train.log +0 -0
  25. logs/20250527_235509/process_pids.txt +0 -2
  26. logs/20250527_235509/remote_rm_qa.log +0 -0
  27. logs/20250527_235509/train.log +0 -0
  28. logs/20250528_110535/process_pids.txt +0 -2
  29. logs/20250528_110535/remote_rm_qa.log +0 -3
  30. logs/20250528_110535/train.log +0 -0
  31. logs/20250528_161139/process_pids.txt +0 -2
  32. logs/20250528_161139/remote_rm_qa.log +0 -3
  33. logs/20250528_161139/train.log +0 -0
  34. logs/20250529_214257/process_pids.txt +0 -2
  35. logs/20250529_214257/remote_rm_qa.log +0 -3
  36. logs/20250529_214257/train.log +0 -0
  37. logs/20250611_110725/process_pids.txt +0 -2
  38. logs/20250611_110725/remote_rm_qa.log +0 -3
  39. logs/20250611_110725/train.log +0 -0
  40. logs/20250611_150946/process_pids.txt +0 -2
  41. logs/20250611_150946/remote_rm_qa.log +0 -0
  42. logs/20250611_150946/train.log +0 -0
  43. logs/20250611_160325/process_pids.txt +0 -2
  44. logs/20250611_160325/remote_rm_qa.log +0 -9
  45. logs/20250611_160325/train.log +0 -0
  46. logs/20250611_161239/process_pids.txt +0 -2
  47. logs/20250611_161239/remote_rm_qa.log +0 -9
  48. logs/20250611_161239/train.log +0 -188
  49. logs/20250611_162203/process_pids.txt +0 -2
  50. logs/20250611_162203/remote_rm_qa.log +0 -0
logs/20250526_182357/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 126386
2
- Train PID: 126387
 
 
 
logs/20250526_182357/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_182357/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_185638/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 435069
2
- Train PID: 435070
 
 
 
logs/20250526_185638/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_185638/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_191827/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 74501
2
- Train PID: 74502
 
 
 
logs/20250526_191827/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_191827/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250526_193312/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 279066
2
- Train PID: 279067
 
 
 
logs/20250526_193312/remote_rm_qa.log DELETED
@@ -1,9 +0,0 @@
1
- [2025-05-26 19:33:56,196] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- load dataset success
3
- * Serving Flask app 'math_verifier_wolatex'
4
- * Debug mode: off
5
- WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
6
- * Running on all addresses (0.0.0.0)
7
- * Running on http://127.0.0.1:2394
8
- * Running on http://10.140.0.144:2394
9
- Press CTRL+C to quit
 
 
 
 
 
 
 
 
 
 
logs/20250526_193312/train.log DELETED
@@ -1,2 +0,0 @@
1
- 2025-05-26 19:33:39,938 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_321e0871e56ca1df.zip.
2
- 2025-05-26 19:33:39,938 INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
 
 
 
logs/20250526_193656/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 29110
2
- Train PID: 29111
 
 
 
logs/20250526_193656/remote_rm_qa.log DELETED
@@ -1,9 +0,0 @@
1
- [2025-05-26 19:37:40,391] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- load dataset success
3
- * Serving Flask app 'math_verifier_wolatex'
4
- * Debug mode: off
5
- WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
6
- * Running on all addresses (0.0.0.0)
7
- * Running on http://127.0.0.1:2394
8
- * Running on http://10.140.0.167:2394
9
- Press CTRL+C to quit
 
 
 
 
 
 
 
 
 
 
logs/20250526_193656/train.log DELETED
@@ -1,77 +0,0 @@
1
- 2025-05-26 19:37:24,046 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_321e0871e56ca1df.zip.
2
- 2025-05-26 19:37:24,047 INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
3
- 2025-05-26 19:37:22,460 INFO cli.py:39 -- Job submission server address: http://127.0.0.1:2983
4
- 2025-05-26 19:37:32,118 SUCC cli.py:63 -- -------------------------------------------------------
5
- 2025-05-26 19:37:32,118 SUCC cli.py:64 -- Job 'raysubmit_endd6V8YzvkhTPfY' submitted successfully
6
- 2025-05-26 19:37:32,118 SUCC cli.py:65 -- -------------------------------------------------------
7
- 2025-05-26 19:37:32,118 INFO cli.py:289 -- Next steps
8
- 2025-05-26 19:37:32,118 INFO cli.py:290 -- Query the logs of the job:
9
- 2025-05-26 19:37:32,118 INFO cli.py:292 -- ray job logs raysubmit_endd6V8YzvkhTPfY
10
- 2025-05-26 19:37:32,118 INFO cli.py:294 -- Query the status of the job:
11
- 2025-05-26 19:37:32,118 INFO cli.py:296 -- ray job status raysubmit_endd6V8YzvkhTPfY
12
- 2025-05-26 19:37:32,118 INFO cli.py:298 -- Request the job to be stopped:
13
- 2025-05-26 19:37:32,119 INFO cli.py:300 -- ray job stop raysubmit_endd6V8YzvkhTPfY
14
- 2025-05-26 19:37:32,121 INFO cli.py:307 -- Tailing logs until the job exits (disable with --no-wait):
15
- 2025-05-26 19:37:31,103 INFO job_manager.py:531 -- Runtime env is setting up.
16
- [2025-05-26 19:38:02,682] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
17
- INFO 05-26 19:38:09 [__init__.py:239] Automatically detected platform cuda.
18
- 2025-05-26 19:38:11,252 INFO worker.py:1520 -- Using address 10.140.0.167:6231 set in the environment variable RAY_ADDRESS
19
- 2025-05-26 19:38:11,253 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.140.0.167:6231...
20
- 2025-05-26 19:38:11,274 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.140.0.167:2983 
21
- (pid=42922) INFO 05-26 19:38:44 [__init__.py:239] Automatically detected platform cuda.
22
- (LLMRayActor pid=42923) INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'score', 'reward', 'generate', 'embed', 'classify'}. Defaulting to 'generate'.
23
- (LLMRayActor pid=42923) WARNING 05-26 19:39:22 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.
24
- (LLMRayActor pid=42923) WARNING 05-26 19:39:22 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.
25
- (LLMRayActor pid=42923) INFO 05-26 19:39:22 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=43, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
26
- (pid=42927) INFO 05-26 19:38:44 [__init__.py:239] Automatically detected platform cuda. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
27
- (LLMRayActor pid=42928) INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'generate', 'reward', 'embed', 'classify', 'score'}. Defaulting to 'generate'.
28
- (LLMRayActor pid=42921) INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'reward', 'embed', 'generate', 'classify', 'score'}. Defaulting to 'generate'.
29
- (LLMRayActor pid=42927) INFO 05-26 19:39:22 [config.py:585] This model supports multiple tasks: {'classify', 'embed', 'score', 'reward', 'generate'}. Defaulting to 'generate'.
30
- (LLMRayActor pid=42924) INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'score', 'embed', 'generate', 'reward', 'classify'}. Defaulting to 'generate'.
31
- (LLMRayActor pid=42922) INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'generate', 'score', 'embed', 'reward', 'classify'}. Defaulting to 'generate'.
32
- (LLMRayActor pid=42925) INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'score', 'generate', 'reward', 'classify', 'embed'}. Defaulting to 'generate'.
33
- (LLMRayActor pid=42926) INFO 05-26 19:39:23 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'classify', 'generate', 'embed'}. Defaulting to 'generate'.
34
- (LLMRayActor pid=42923) [2025-05-26 19:39:26,028] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
35
- (LLMRayActor pid=42923) INFO 05-26 19:39:33 [cuda.py:293] Using Flash Attention backend.
36
- (LLMRayActor pid=42925) WARNING 05-26 19:39:23 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine. [repeated 7x across cluster]
37
- (LLMRayActor pid=42925) WARNING 05-26 19:39:23 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled. [repeated 7x across cluster]
38
- (LLMRayActor pid=42925) INFO 05-26 19:39:23 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=44, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,  [repeated 7x across cluster]
39
- (LLMRayActor pid=42926) [2025-05-26 19:39:26,784] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [repeated 7x across cluster]
40
- (LLMRayActor pid=42923) INFO 05-26 19:39:36 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
41
- (LLMRayActor pid=42924) INFO 05-26 19:39:36 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...
42
- (LLMRayActor pid=42922) INFO 05-26 19:39:37 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
43
- (LLMRayActor pid=42924)
44
- Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00<?, ?it/s]
45
- (LLMRayActor pid=42924)
46
- Loading safetensors checkpoint shards: 20% Completed | 1/5 [00:00<00:03, 1.07it/s]
47
- (LLMRayActor pid=42927)
48
- Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00<?, ?it/s] [repeated 7x across cluster]
49
- (LLMRayActor pid=42924)
50
- Loading safetensors checkpoint shards: 60% Completed | 3/5 [00:08<00:06, 3.28s/it] [repeated 16x across cluster]
51
- (LLMRayActor pid=42924) INFO 05-26 19:39:54 [loader.py:429] Loading weights took 16.67 seconds
52
- (LLMRayActor pid=42926) INFO 05-26 19:39:33 [cuda.py:293] Using Flash Attention backend. [repeated 7x across cluster]
53
- (LLMRayActor pid=42926) INFO 05-26 19:39:37 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 [repeated 7x across cluster]
54
- (LLMRayActor pid=42926) INFO 05-26 19:39:37 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/... [repeated 7x across cluster]
55
- (LLMRayActor pid=42926) INFO 05-26 19:39:37 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248] [repeated 7x across cluster]
56
- (LLMRayActor pid=42924)
57
- (LLMRayActor pid=42924)
58
- Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:16<00:00, 3.30s/it] [repeated 17x across cluster]
59
- (LLMRayActor pid=42925)
60
- (LLMRayActor pid=42922)
61
- (LLMRayActor pid=42923)
62
- (LLMRayActor pid=42926)
63
- (LLMRayActor pid=42921)
64
- (LLMRayActor pid=42928)
65
- (LLMRayActor pid=42927)
66
- (LLMRayActor pid=42924) INFO 05-26 19:39:54 [model_runner.py:1146] Model loading took 15.6271 GB and 17.157056 seconds
67
- (LLMRayActor pid=42924) WARNING 05-26 19:39:55 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.
68
- (LLMRayActor pid=42924) Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
69
- (LLMRayActor pid=42927) WARNING 05-26 19:40:00 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.
70
- (LLMRayActor pid=42927) INFO 05-26 19:39:54 [loader.py:429] Loading weights took 16.66 seconds [repeated 7x across cluster]
71
- (LLMRayActor pid=42928) INFO 05-26 19:39:54 [model_runner.py:1146] Model loading took 15.6271 GB and 17.248244 seconds [repeated 7x across cluster]
72
- (LLMRayActor pid=42928) WARNING 05-26 19:39:55 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1. [repeated 7x across cluster]
73
- (LLMRayActor pid=42924) INFO 05-26 19:40:02 [worker.py:267] Memory profiling takes 7.96 seconds
74
- (LLMRayActor pid=42924) INFO 05-26 19:40:02 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB
75
- (LLMRayActor pid=42924) INFO 05-26 19:40:02 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.
76
- (LLMRayActor pid=42924) INFO 05-26 19:40:02 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681
77
- (LLMRayActor pid=42924) INFO 05-26 19:40:02 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
logs/20250526_194456/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 11287
2
- Train PID: 11288
 
 
 
logs/20250526_194456/remote_rm_qa.log DELETED
File without changes
logs/20250526_194456/train.log DELETED
@@ -1,92 +0,0 @@
1
- Traceback (most recent call last):
2
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
3
- sock = connection.create_connection(
4
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
5
- raise err
6
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
7
- sock.connect(sa)
8
- ConnectionRefusedError: [Errno 111] Connection refused
9
-
10
- The above exception was the direct cause of the following exception:
11
-
12
- Traceback (most recent call last):
13
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
14
- response = self._make_request(
15
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 493, in _make_request
16
- conn.request(
17
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 445, in request
18
- self.endheaders()
19
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 1278, in endheaders
20
- self._send_output(message_body, encode_chunked=encode_chunked)
21
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 1038, in _send_output
22
- self.send(msg)
23
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/http/client.py", line 976, in send
24
- self.connect()
25
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 276, in connect
26
- self.sock = self._new_conn()
27
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn
28
- raise NewConnectionError(
29
- urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused
30
-
31
- The above exception was the direct cause of the following exception:
32
-
33
- Traceback (most recent call last):
34
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
35
- resp = conn.urlopen(
36
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
37
- retries = retries.increment(
38
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
39
- raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
40
- urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=2983): Max retries exceeded with url: /api/version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
41
-
42
- During handling of the above exception, another exception occurred:
43
-
44
- Traceback (most recent call last):
45
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 262, in _check_connection_and_version_with_url
46
- r = self._do_request("GET", url)
47
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 303, in _do_request
48
- return requests.request(
49
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/api.py", line 59, in request
50
- return session.request(method=method, url=url, **kwargs)
51
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
52
- resp = self.send(prep, **send_kwargs)
53
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
54
- r = adapter.send(request, **kwargs)
55
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
56
- raise ConnectionError(e, request=request)
57
- requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=2983): Max retries exceeded with url: /api/version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fabc4ee1cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
58
-
59
- During handling of the above exception, another exception occurred:
60
-
61
- Traceback (most recent call last):
62
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/bin/ray", line 8, in <module>
63
- sys.exit(main())
64
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2690, in main
65
- return cli()
66
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
67
- return self.main(*args, **kwargs)
68
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1082, in main
69
- rv = self.invoke(ctx)
70
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
71
- return _process_result(sub_ctx.command.invoke(sub_ctx))
72
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
73
- return _process_result(sub_ctx.command.invoke(sub_ctx))
74
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
75
- return ctx.invoke(self.callback, **ctx.params)
76
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 788, in invoke
77
- return __callback(*args, **kwargs)
78
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
79
- return func(*args, **kwargs)
80
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
81
- return f(*args, **kwargs)
82
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 267, in submit
83
- client = _get_sdk_client(
84
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 32, in _get_sdk_client
85
- client = JobSubmissionClient(
86
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 105, in __init__
87
- self._check_connection_and_version(
88
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 248, in _check_connection_and_version
89
- self._check_connection_and_version_with_url(min_version, version_error_message)
90
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 278, in _check_connection_and_version_with_url
91
- raise ConnectionError(
92
- ConnectionError: Failed to connect to Ray at address: http://127.0.0.1:2983.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
logs/20250527_011343/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 4686
2
- Train PID: 4687
 
 
 
logs/20250527_011343/remote_rm_qa.log DELETED
@@ -1,9 +0,0 @@
1
- [2025-05-27 01:14:20,482] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- load dataset success
3
- * Serving Flask app 'math_verifier_wolatex'
4
- * Debug mode: off
5
- WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
6
- * Running on all addresses (0.0.0.0)
7
- * Running on http://127.0.0.1:2394
8
- * Running on http://10.140.0.151:2394
9
- Press CTRL+C to quit
 
 
 
 
 
 
 
 
 
 
logs/20250527_011343/train.log DELETED
@@ -1,45 +0,0 @@
1
- 2025-05-27 01:14:05,551 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_ebceb3f924b2a11c.zip.
2
- 2025-05-27 01:14:05,551 INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
3
- 2025-05-27 01:14:04,421 INFO cli.py:39 -- Job submission server address: http://127.0.0.1:2983
4
- Traceback (most recent call last):
5
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/bin/ray", line 8, in <module>
6
- sys.exit(main())
7
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2690, in main
8
- return cli()
9
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
10
- return self.main(*args, **kwargs)
11
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1082, in main
12
- rv = self.invoke(ctx)
13
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
14
- return _process_result(sub_ctx.command.invoke(sub_ctx))
15
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
16
- return _process_result(sub_ctx.command.invoke(sub_ctx))
17
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
18
- return ctx.invoke(self.callback, **ctx.params)
19
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/click/core.py", line 788, in invoke
20
- return __callback(*args, **kwargs)
21
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
22
- return func(*args, **kwargs)
23
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
24
- return f(*args, **kwargs)
25
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 276, in submit
26
- job_id = client.submit_job(
27
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 250, in submit_job
28
- self._raise_error(r)
29
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
30
- raise RuntimeError(
31
- RuntimeError: Request failed with status code 500: Traceback (most recent call last):
32
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 388, in submit_job
33
- resp = await job_agent_client.submit_job_internal(submit_request)
34
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 82, in submit_job_internal
35
- async with self._session.post(
36
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in __aenter__
37
- self._resp: _RetType = await self._coro
38
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client.py", line 730, in _request
39
- await resp.start(conn)
40
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1059, in start
41
- message, payload = await protocol.read() # type: ignore[union-attr]
42
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/aiohttp/streams.py", line 672, in read
43
- await self._waiter
44
- aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
45
- .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
logs/20250527_095510/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 299008
2
- Train PID: 299009
 
 
 
logs/20250527_095510/remote_rm_qa.log DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8687dd357ffe27589cfa1830fb7e278b2da07b36db566391235527667230c5da
3
- size 68981940
 
 
 
 
logs/20250527_095510/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250527_235509/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 412160
2
- Train PID: 412161
 
 
 
logs/20250527_235509/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250527_235509/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250528_110535/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 268533
2
- Train PID: 268534
 
 
 
logs/20250528_110535/remote_rm_qa.log DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:34ca2314ebf41df0ef2e41309a606861f152d7d4f8c45c559336e02c69cdaea1
3
- size 22480261
 
 
 
 
logs/20250528_110535/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250528_161139/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 395733
2
- Train PID: 395734
 
 
 
logs/20250528_161139/remote_rm_qa.log DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1c3ce9ee9d15d3a606842f7b7fcadd18f5efe8426c15279342e86e14f021a801
3
- size 59636609
 
 
 
 
logs/20250528_161139/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250529_214257/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 332906
2
- Train PID: 332907
 
 
 
logs/20250529_214257/remote_rm_qa.log DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1a81eeda8d4083e2da026795759cdbf80c7be314480b3705f06293d4cd26cbb4
3
- size 31232415
 
 
 
 
logs/20250529_214257/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250611_110725/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 128943
2
- Train PID: 128944
 
 
 
logs/20250611_110725/remote_rm_qa.log DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b03d175ee35078ea79f97a809654b6eb1c8f9b7944755da98a05420253009b30
3
- size 20260314
 
 
 
 
logs/20250611_110725/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250611_150946/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 328854
2
- Train PID: 328855
 
 
 
logs/20250611_150946/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250611_150946/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250611_160325/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 459479
2
- Train PID: 459480
 
 
 
logs/20250611_160325/remote_rm_qa.log DELETED
@@ -1,9 +0,0 @@
1
- [2025-06-11 16:04:02,932] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- load dataset success
3
- * Serving Flask app 'math_verifier_wolatex'
4
- * Debug mode: off
5
- WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
6
- * Running on all addresses (0.0.0.0)
7
- * Running on http://127.0.0.1:2399
8
- * Running on http://10.140.1.42:2399
9
- Press CTRL+C to quit
 
 
 
 
 
 
 
 
 
 
logs/20250611_160325/train.log DELETED
The diff for this file is too large to render. See raw diff
 
logs/20250611_161239/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 104691
2
- Train PID: 104692
 
 
 
logs/20250611_161239/remote_rm_qa.log DELETED
@@ -1,9 +0,0 @@
1
- [2025-06-11 16:13:16,143] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2
- load dataset success
3
- * Serving Flask app 'math_verifier_wolatex'
4
- * Debug mode: off
5
- WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
6
- * Running on all addresses (0.0.0.0)
7
- * Running on http://127.0.0.1:2399
8
- * Running on http://10.140.1.48:2399
9
- Press CTRL+C to quit
 
 
 
 
 
 
 
 
 
 
logs/20250611_161239/train.log DELETED
@@ -1,188 +0,0 @@
1
- 2025-06-11 16:13:02,348 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_e427a4376bfc803e.zip.
2
- 2025-06-11 16:13:02,349 INFO packaging.py:575 -- Creating a file package for local module '/mnt/petrelfs/luyiting/MultiAgentEval/lmm-r1'.
3
- 2025-06-11 16:13:00,858 INFO cli.py:39 -- Job submission server address: http://127.0.0.1:2989
4
- 2025-06-11 16:13:08,059 SUCC cli.py:63 -- -------------------------------------------------------
5
- 2025-06-11 16:13:08,059 SUCC cli.py:64 -- Job 'raysubmit_aazeFP8fmRtZyntC' submitted successfully
6
- 2025-06-11 16:13:08,059 SUCC cli.py:65 -- -------------------------------------------------------
7
- 2025-06-11 16:13:08,059 INFO cli.py:289 -- Next steps
8
- 2025-06-11 16:13:08,060 INFO cli.py:290 -- Query the logs of the job:
9
- 2025-06-11 16:13:08,060 INFO cli.py:292 -- ray job logs raysubmit_aazeFP8fmRtZyntC
10
- 2025-06-11 16:13:08,060 INFO cli.py:294 -- Query the status of the job:
11
- 2025-06-11 16:13:08,060 INFO cli.py:296 -- ray job status raysubmit_aazeFP8fmRtZyntC
12
- 2025-06-11 16:13:08,060 INFO cli.py:298 -- Request the job to be stopped:
13
- 2025-06-11 16:13:08,060 INFO cli.py:300 -- ray job stop raysubmit_aazeFP8fmRtZyntC
14
- 2025-06-11 16:13:08,062 INFO cli.py:307 -- Tailing logs until the job exits (disable with --no-wait):
15
- 2025-06-11 16:13:07,524 INFO job_manager.py:531 -- Runtime env is setting up.
16
- [2025-06-11 16:13:32,045] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
17
- INFO 06-11 16:13:37 [__init__.py:239] Automatically detected platform cuda.
18
- 2025-06-11 16:13:38,183 INFO worker.py:1520 -- Using address 10.140.1.48:6239 set in the environment variable RAY_ADDRESS
19
- 2025-06-11 16:13:38,184 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.140.1.48:6239...
20
- 2025-06-11 16:13:38,204 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.140.1.48:2989 
21
- (pid=117690) INFO 06-11 16:14:04 [__init__.py:239] Automatically detected platform cuda.
22
- (LLMRayActor pid=117685) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
23
- (LLMRayActor pid=117685) WARNING 06-11 16:14:34 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine.
24
- (LLMRayActor pid=117685) WARNING 06-11 16:14:34 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled.
25
- (LLMRayActor pid=117685) INFO 06-11 16:14:34 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=42, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
26
- (pid=117689) INFO 06-11 16:14:04 [__init__.py:239] Automatically detected platform cuda. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
27
- (LLMRayActor pid=117691) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'classify', 'score', 'generate', 'embed', 'reward'}. Defaulting to 'generate'.
28
- (LLMRayActor pid=117686) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'embed', 'generate', 'score', 'classify', 'reward'}. Defaulting to 'generate'.
29
- (LLMRayActor pid=117689) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'score', 'generate', 'classify', 'embed'}. Defaulting to 'generate'.
30
- (LLMRayActor pid=117690) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'reward', 'generate', 'score', 'embed', 'classify'}. Defaulting to 'generate'.
31
- (LLMRayActor pid=117688) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'generate', 'reward', 'embed', 'score', 'classify'}. Defaulting to 'generate'.
32
- (LLMRayActor pid=117687) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'embed', 'reward', 'score', 'generate', 'classify'}. Defaulting to 'generate'.
33
- (LLMRayActor pid=117692) INFO 06-11 16:14:34 [config.py:585] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'.
34
- (LLMRayActor pid=117685) [2025-06-11 16:14:37,962] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
35
- (LLMRayActor pid=117685) INFO 06-11 16:14:43 [cuda.py:293] Using Flash Attention backend.
36
- (LLMRayActor pid=117692) WARNING 06-11 16:14:34 [arg_utils.py:1846] VLLM_ATTENTION_BACKEND=triton is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=triton from your config in favor of the V1 Engine. [repeated 7x across cluster]
37
- (LLMRayActor pid=117692) WARNING 06-11 16:14:34 [arg_utils.py:1745] --enable-prefix-caching is not supported for multimodal models in V0 and has been disabled. [repeated 7x across cluster]
38
- (LLMRayActor pid=117692) INFO 06-11 16:14:34 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2.dev76+gf68cce8) with config: model='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', speculative_config=None, tokenizer='/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=45, served_model_name=/mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,  [repeated 7x across cluster]
39
- (LLMRayActor pid=117692) [2025-06-11 16:14:37,973] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [repeated 7x across cluster]
40
- (LLMRayActor pid=117685) INFO 06-11 16:14:47 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
41
- (LLMRayActor pid=117685) INFO 06-11 16:14:47 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/...
42
- (LLMRayActor pid=117685) INFO 06-11 16:14:48 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
43
- (LLMRayActor pid=117692)
44
- Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00<?, ?it/s]
45
- (LLMRayActor pid=117692) INFO 06-11 16:14:43 [cuda.py:293] Using Flash Attention backend. [repeated 7x across cluster]
46
- (LLMRayActor pid=117685)
47
- Loading safetensors checkpoint shards: 20% Completed | 1/5 [00:00<00:03, 1.24it/s]
48
- (LLMRayActor pid=117689)
49
- Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00<?, ?it/s] [repeated 7x across cluster]
50
- (LLMRayActor pid=117685)
51
- Loading safetensors checkpoint shards: 60% Completed | 3/5 [00:08<00:06, 3.04s/it] [repeated 16x across cluster]
52
- (LLMRayActor pid=117685) INFO 06-11 16:15:04 [loader.py:429] Loading weights took 15.74 seconds
53
- (LLMRayActor pid=117689) INFO 06-11 16:14:47 [parallel_state.py:967] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 [repeated 7x across cluster]
54
- (LLMRayActor pid=117689) INFO 06-11 16:14:47 [model_runner.py:1110] Starting to load model /mnt/petrelfs/luyiting/ckt/Qwen2.5-VL-7B-Instruct/Qwen2.5-VL-7B-Instruct/... [repeated 7x across cluster]
55
- (LLMRayActor pid=117690) INFO 06-11 16:14:48 [config.py:3229] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248] [repeated 7x across cluster]
56
- (LLMRayActor pid=117685)
57
- (LLMRayActor pid=117685)
58
- Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:15<00:00, 3.12s/it] [repeated 17x across cluster]
59
- (LLMRayActor pid=117690)
60
- (LLMRayActor pid=117687)
61
- (LLMRayActor pid=117691)
62
- (LLMRayActor pid=117689)
63
- (LLMRayActor pid=117692)
64
- (LLMRayActor pid=117686)
65
- (LLMRayActor pid=117688)
66
- (LLMRayActor pid=117692) INFO 06-11 16:15:04 [model_runner.py:1146] Model loading took 15.6271 GB and 16.982959 seconds
67
- (LLMRayActor pid=117692) Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
68
- (LLMRayActor pid=117692) WARNING 06-11 16:15:05 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1.
69
- (LLMRayActor pid=117685) WARNING 06-11 16:15:10 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.
70
- (LLMRayActor pid=117692) INFO 06-11 16:15:04 [loader.py:429] Loading weights took 15.90 seconds [repeated 7x across cluster]
71
- (LLMRayActor pid=117687) INFO 06-11 16:15:04 [model_runner.py:1146] Model loading took 15.6271 GB and 17.057258 seconds [repeated 7x across cluster]
72
- (LLMRayActor pid=117687) WARNING 06-11 16:15:05 [model_runner.py:1296] Computed max_num_seqs (min(256, 8192 // 32768)) to be less than 1. Setting it to the minimum value of 1. [repeated 7x across cluster]
73
- (LLMRayActor pid=117685) INFO 06-11 16:15:12 [worker.py:267] Memory profiling takes 8.24 seconds
74
- (LLMRayActor pid=117685) INFO 06-11 16:15:12 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB
75
- (LLMRayActor pid=117685) INFO 06-11 16:15:12 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB.
76
- (LLMRayActor pid=117685) INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681
77
- (LLMRayActor pid=117685) INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x
78
- (LLMRayActor pid=117688) CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62
79
- (LLMRayActor pid=117688)
80
- Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:15<00:00, 3.12s/it] [repeated 14x across cluster]
81
- (LLMRayActor pid=117688) Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [repeated 7x across cluster]
82
- Traceback (most recent call last):
83
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
84
- return _run_code(code, main_globals, None,
85
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/runpy.py", line 86, in _run_code
86
- exec(code, run_globals)
87
- File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/cli/train_ppo_ray.py", line 497, in <module>
88
- train(args)
89
- File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/cli/train_ppo_ray.py", line 86, in train
90
- vllm_engines = create_vllm_engines(
91
- File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 189, in create_vllm_engines
92
- batch_vllm_engine_call(vllm_engines, "sleep", rank_0_only=False)
93
- File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 216, in batch_vllm_engine_call
94
- return ray.get(refs)
95
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
96
- return fn(*args, **kwargs)
97
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
98
- return func(*args, **kwargs)
99
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/worker.py", line 2782, in get
100
- values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
101
- File "/mnt/petrelfs/luyiting/anaconda3/envs/lmmr1/lib/python3.10/site-packages/ray/_private/worker.py", line 931, in get_objects
102
- raise value
103
- ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::LLMRayActor.__init__() (pid=117688, ip=10.140.1.48, actor_id=ac9437056a85b29812c677da02000000, repr=<openrlhf.trainer.ray.vllm_engine.LLMRayActor object at 0x7f59ee72e170>)
104
- File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 54, in __init__
105
- self.llm = LLM(*args, **kwargs)
106
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 1037, in inner
107
- return fn(*args, **kwargs)
108
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/entrypoints/llm.py", line 243, in __init__
109
- self.llm_engine = LLMEngine.from_engine_args(
110
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 520, in from_engine_args
111
- return engine_cls.from_vllm_config(
112
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 496, in from_vllm_config
113
- return cls(
114
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 283, in __init__
115
- self._initialize_kv_caches()
116
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 445, in _initialize_kv_caches
117
- self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
118
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/executor_base.py", line 122, in initialize_cache
119
- self.collective_rpc("initialize_cache",
120
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
121
- answer = run_method(self.driver_worker, method, args, kwargs)
122
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 2255, in run_method
123
- return func(*args, **kwargs)
124
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 307, in initialize_cache
125
- self._init_cache_engine()
126
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 312, in _init_cache_engine
127
- self.cache_engine = [
128
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 313, in <listcomp>
129
- CacheEngine(self.cache_config, self.model_config,
130
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 64, in __init__
131
- self.gpu_cache = self._allocate_kv_cache(
132
- File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
133
- layer_kv_cache = torch.zeros(kv_cache_shape,
134
- torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
135
- (LLMRayActor pid=117688) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::LLMRayActor.__init__() (pid=117688, ip=10.140.1.48, actor_id=ac9437056a85b29812c677da02000000, repr=<openrlhf.trainer.ray.vllm_engine.LLMRayActor object at 0x7f59ee72e170>)
136
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/tmp_ray/session_2025-06-11_16-12-48_168623_103703/runtime_resources/working_dir_files/_ray_pkg_e427a4376bfc803e/openrlhf/trainer/ray/vllm_engine.py", line 54, in __init__
137
- (LLMRayActor pid=117688) self.llm = LLM(*args, **kwargs)
138
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 1037, in inner
139
- (LLMRayActor pid=117688) return fn(*args, **kwargs)
140
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/entrypoints/llm.py", line 243, in __init__
141
- (LLMRayActor pid=117688) self.llm_engine = LLMEngine.from_engine_args(
142
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 520, in from_engine_args
143
- (LLMRayActor pid=117688) return engine_cls.from_vllm_config(
144
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 496, in from_vllm_config
145
- (LLMRayActor pid=117688) return cls(
146
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 283, in __init__
147
- (LLMRayActor pid=117688) self._initialize_kv_caches()
148
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/engine/llm_engine.py", line 445, in _initialize_kv_caches
149
- (LLMRayActor pid=117688) self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
150
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/executor_base.py", line 122, in initialize_cache
151
- (LLMRayActor pid=117688) self.collective_rpc("initialize_cache",
152
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
153
- (LLMRayActor pid=117688) answer = run_method(self.driver_worker, method, args, kwargs)
154
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/utils.py", line 2255, in run_method
155
- (LLMRayActor pid=117688) return func(*args, **kwargs)
156
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 307, in initialize_cache
157
- (LLMRayActor pid=117688) self._init_cache_engine()
158
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 312, in _init_cache_engine
159
- (LLMRayActor pid=117688) self.cache_engine = [
160
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/worker.py", line 313, in <listcomp>
161
- (LLMRayActor pid=117688) CacheEngine(self.cache_config, self.model_config,
162
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 64, in __init__
163
- (LLMRayActor pid=117688) self.gpu_cache = self._allocate_kv_cache(
164
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
165
- (LLMRayActor pid=117688) layer_kv_cache = torch.zeros(kv_cache_shape,
166
- (LLMRayActor pid=117688) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
167
- (LLMRayActor pid=117687) WARNING 06-11 16:15:11 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`. [repeated 7x across cluster]
168
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] Memory profiling takes 8.37 seconds [repeated 7x across cluster]
169
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB [repeated 7x across cluster]
170
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB. [repeated 7x across cluster]
171
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681 [repeated 7x across cluster]
172
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x [repeated 7x across cluster]
173
- (LLMRayActor pid=117689) CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62 [repeated 2x across cluster]
174
- 2025-06-11 16:15:19,325 ERR cli.py:71 -- ---------------------------------------
175
- 2025-06-11 16:15:19,326 ERR cli.py:72 -- Job 'raysubmit_aazeFP8fmRtZyntC' failed
176
- 2025-06-11 16:15:19,326 ERR cli.py:73 -- ---------------------------------------
177
- 2025-06-11 16:15:19,326 INFO cli.py:86 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
178
- (LLMRayActor pid=117688) File "/mnt/petrelfs/luyiting/MultiAgentEval/vllm/vllm/worker/cache_engine.py", line 83, in _allocate_kv_cache
179
- (LLMRayActor pid=117688) layer_kv_cache = torch.zeros(kv_cache_shape,
180
- (LLMRayActor pid=117688) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 832.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 611.56 MiB is free. Process 71866 has 48.06 GiB memory in use. Including non-PyTorch memory, this process has 30.66 GiB memory in use. Of the allocated memory 29.60 GiB is allocated by PyTorch, with 159.90 MiB allocated in private pools (e.g., CUDA Graphs), and 13.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
181
- (LLMRayActor pid=117687) WARNING 06-11 16:15:11 [profiling.py:222] The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = 8192) is too short to hold the multi-modal embeddings in the worst case (32768 tokens in total, out of which {'image': 16384, 'video': 16384} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`. [repeated 7x across cluster]
182
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] Memory profiling takes 8.37 seconds [repeated 7x across cluster]
183
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] the current vLLM instance can use total_gpu_memory (79.32GiB) x gpu_memory_utilization (0.50) = 39.66GiB [repeated 7x across cluster]
184
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [worker.py:267] model weights take 15.63GiB; non_torch_memory takes 0.21GiB; PyTorch activation peak memory takes 1.09GiB; the rest of the memory reserved for KV Cache is 22.73GiB. [repeated 7x across cluster]
185
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [executor_base.py:111] # cuda blocks: 26598, # CPU blocks: 4681 [repeated 7x across cluster]
186
- (LLMRayActor pid=117687) INFO 06-11 16:15:13 [executor_base.py:116] Maximum concurrency for 8192 tokens per request: 51.95x [repeated 7x across cluster]
187
- (LLMRayActor pid=117689) CUDA Error: out of memory at /mnt/petrelfs/luyiting/MultiAgentEval/vllm/csrc/cumem_allocator.cpp:62 [repeated 2x across cluster]
188
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
logs/20250611_162203/process_pids.txt DELETED
@@ -1,2 +0,0 @@
1
- Remote RM PID: 74069
2
- Train PID: 74070
 
 
 
logs/20250611_162203/remote_rm_qa.log DELETED
The diff for this file is too large to render. See raw diff